Causality is receiving increasing attention by the artificial intelligence and machine learning communities. This paper gives an example of modelling a recommender system problem using causal graphs. Specifically, we approached the causal discovery task to learn a causal graph by combining observational data from an open-source dataset with prior knowledge. The resulting causal graph shows that only a few variables effectively influence the analysed feedback signals. This contrasts with the recent trend in the machine learning community to include more and more variables in massive models, such as neural networks.
{"title":"Causal Discovery in Recommender Systems: Example and Discussion","authors":"Emanuele Cavenaghi, Fabio Stella, Markus Zanker","doi":"arxiv-2409.10271","DOIUrl":"https://doi.org/arxiv-2409.10271","url":null,"abstract":"Causality is receiving increasing attention by the artificial intelligence\u0000and machine learning communities. This paper gives an example of modelling a\u0000recommender system problem using causal graphs. Specifically, we approached the\u0000causal discovery task to learn a causal graph by combining observational data\u0000from an open-source dataset with prior knowledge. The resulting causal graph\u0000shows that only a few variables effectively influence the analysed feedback\u0000signals. This contrasts with the recent trend in the machine learning community\u0000to include more and more variables in massive models, such as neural networks.","PeriodicalId":501281,"journal":{"name":"arXiv - CS - Information Retrieval","volume":"213 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142255420","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In recent years, there has been a surge in the publication of clinical trial reports, making it challenging to conduct systematic reviews. Automatically extracting Population, Intervention, Comparator, and Outcome (PICO) from clinical trial studies can alleviate the traditionally time-consuming process of manually scrutinizing systematic reviews. Existing approaches of PICO frame extraction involves supervised approach that relies on the existence of manually annotated data points in the form of BIO label tagging. Recent approaches, such as In-Context Learning (ICL), which has been shown to be effective for a number of downstream NLP tasks, require the use of labeled examples. In this work, we adopt ICL strategy by employing the pretrained knowledge of Large Language Models (LLMs), gathered during the pretraining phase of an LLM, to automatically extract the PICO-related terminologies from clinical trial documents in unsupervised set up to bypass the availability of large number of annotated data instances. Additionally, to showcase the highest effectiveness of LLM in oracle scenario where large number of annotated samples are available, we adopt the instruction tuning strategy by employing Low Rank Adaptation (LORA) to conduct the training of gigantic model in low resource environment for the PICO frame extraction task. Our empirical results show that our proposed ICL-based framework produces comparable results on all the version of EBM-NLP datasets and the proposed instruction tuned version of our framework produces state-of-the-art results on all the different EBM-NLP datasets. Our project is available at url{https://github.com/shrimonmuke0202/AlpaPICO.git}.
{"title":"AlpaPICO: Extraction of PICO Frames from Clinical Trial Documents Using LLMs","authors":"Madhusudan Ghosh, Shrimon Mukherjee, Asmit Ganguly, Partha Basuchowdhuri, Sudip Kumar Naskar, Debasis Ganguly","doi":"arxiv-2409.09704","DOIUrl":"https://doi.org/arxiv-2409.09704","url":null,"abstract":"In recent years, there has been a surge in the publication of clinical trial\u0000reports, making it challenging to conduct systematic reviews. Automatically\u0000extracting Population, Intervention, Comparator, and Outcome (PICO) from\u0000clinical trial studies can alleviate the traditionally time-consuming process\u0000of manually scrutinizing systematic reviews. Existing approaches of PICO frame\u0000extraction involves supervised approach that relies on the existence of\u0000manually annotated data points in the form of BIO label tagging. Recent\u0000approaches, such as In-Context Learning (ICL), which has been shown to be\u0000effective for a number of downstream NLP tasks, require the use of labeled\u0000examples. In this work, we adopt ICL strategy by employing the pretrained\u0000knowledge of Large Language Models (LLMs), gathered during the pretraining\u0000phase of an LLM, to automatically extract the PICO-related terminologies from\u0000clinical trial documents in unsupervised set up to bypass the availability of\u0000large number of annotated data instances. Additionally, to showcase the highest\u0000effectiveness of LLM in oracle scenario where large number of annotated samples\u0000are available, we adopt the instruction tuning strategy by employing Low Rank\u0000Adaptation (LORA) to conduct the training of gigantic model in low resource\u0000environment for the PICO frame extraction task. Our empirical results show that\u0000our proposed ICL-based framework produces comparable results on all the version\u0000of EBM-NLP datasets and the proposed instruction tuned version of our framework\u0000produces state-of-the-art results on all the different EBM-NLP datasets. Our\u0000project is available at url{https://github.com/shrimonmuke0202/AlpaPICO.git}.","PeriodicalId":501281,"journal":{"name":"arXiv - CS - Information Retrieval","volume":"18 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142255105","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Recency bias in a sequential recommendation system refers to the overly high emphasis placed on recent items within a user session. This bias can diminish the serendipity of recommendations and hinder the system's ability to capture users' long-term interests, leading to user disengagement. We propose a simple yet effective novel metric specifically designed to quantify recency bias. Our findings also demonstrate that high recency bias measured in our proposed metric adversely impacts recommendation performance too, and mitigating it results in improved recommendation performances across all models evaluated in our experiments, thus highlighting the importance of measuring recency bias.
{"title":"Measuring Recency Bias In Sequential Recommendation Systems","authors":"Jeonglyul Oh, Sungzoon Cho","doi":"arxiv-2409.09722","DOIUrl":"https://doi.org/arxiv-2409.09722","url":null,"abstract":"Recency bias in a sequential recommendation system refers to the overly high\u0000emphasis placed on recent items within a user session. This bias can diminish\u0000the serendipity of recommendations and hinder the system's ability to capture\u0000users' long-term interests, leading to user disengagement. We propose a simple\u0000yet effective novel metric specifically designed to quantify recency bias. Our\u0000findings also demonstrate that high recency bias measured in our proposed\u0000metric adversely impacts recommendation performance too, and mitigating it\u0000results in improved recommendation performances across all models evaluated in\u0000our experiments, thus highlighting the importance of measuring recency bias.","PeriodicalId":501281,"journal":{"name":"arXiv - CS - Information Retrieval","volume":"28 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142255101","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mohamed Sobhi Jabal, Pranav Warman, Jikai Zhang, Kartikeye Gupta, Ayush Jain, Maciej Mazurowski, Walter Wiggins, Kirti Magudia, Evan Calabrese
Purpose: To develop and evaluate an automated system for extracting structured clinical information from unstructured radiology and pathology reports using open-weights large language models (LMs) and retrieval augmented generation (RAG), and to assess the effects of model configuration variables on extraction performance. Methods and Materials: The study utilized two datasets: 7,294 radiology reports annotated for Brain Tumor Reporting and Data System (BT-RADS) scores and 2,154 pathology reports annotated for isocitrate dehydrogenase (IDH) mutation status. An automated pipeline was developed to benchmark the performance of various LMs and RAG configurations. The impact of model size, quantization, prompting strategies, output formatting, and inference parameters was systematically evaluated. Results: The best performing models achieved over 98% accuracy in extracting BT-RADS scores from radiology reports and over 90% for IDH mutation status extraction from pathology reports. The top model being medical fine-tuned llama3. Larger, newer, and domain fine-tuned models consistently outperformed older and smaller models. Model quantization had minimal impact on performance. Few-shot prompting significantly improved accuracy. RAG improved performance for complex pathology reports but not for shorter radiology reports. Conclusions: Open LMs demonstrate significant potential for automated extraction of structured clinical data from unstructured clinical reports with local privacy-preserving application. Careful model selection, prompt engineering, and semi-automated optimization using annotated data are critical for optimal performance. These approaches could be reliable enough for practical use in research workflows, highlighting the potential for human-machine collaboration in healthcare data extraction.
{"title":"Language Models and Retrieval Augmented Generation for Automated Structured Data Extraction from Diagnostic Reports","authors":"Mohamed Sobhi Jabal, Pranav Warman, Jikai Zhang, Kartikeye Gupta, Ayush Jain, Maciej Mazurowski, Walter Wiggins, Kirti Magudia, Evan Calabrese","doi":"arxiv-2409.10576","DOIUrl":"https://doi.org/arxiv-2409.10576","url":null,"abstract":"Purpose: To develop and evaluate an automated system for extracting\u0000structured clinical information from unstructured radiology and pathology\u0000reports using open-weights large language models (LMs) and retrieval augmented\u0000generation (RAG), and to assess the effects of model configuration variables on\u0000extraction performance. Methods and Materials: The study utilized two datasets:\u00007,294 radiology reports annotated for Brain Tumor Reporting and Data System\u0000(BT-RADS) scores and 2,154 pathology reports annotated for isocitrate\u0000dehydrogenase (IDH) mutation status. An automated pipeline was developed to\u0000benchmark the performance of various LMs and RAG configurations. The impact of\u0000model size, quantization, prompting strategies, output formatting, and\u0000inference parameters was systematically evaluated. Results: The best performing\u0000models achieved over 98% accuracy in extracting BT-RADS scores from radiology\u0000reports and over 90% for IDH mutation status extraction from pathology reports.\u0000The top model being medical fine-tuned llama3. Larger, newer, and domain\u0000fine-tuned models consistently outperformed older and smaller models. Model\u0000quantization had minimal impact on performance. Few-shot prompting\u0000significantly improved accuracy. RAG improved performance for complex pathology\u0000reports but not for shorter radiology reports. Conclusions: Open LMs\u0000demonstrate significant potential for automated extraction of structured\u0000clinical data from unstructured clinical reports with local privacy-preserving\u0000application. Careful model selection, prompt engineering, and semi-automated\u0000optimization using annotated data are critical for optimal performance. These\u0000approaches could be reliable enough for practical use in research workflows,\u0000highlighting the potential for human-machine collaboration in healthcare data\u0000extraction.","PeriodicalId":501281,"journal":{"name":"arXiv - CS - Information Retrieval","volume":"41 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142255416","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ranking a set of items based on their relevance to a given query is a core problem in search and recommendation. Transformer-based ranking models are the state-of-the-art approaches for such tasks, but they score each query-item independently, ignoring the joint context of other relevant items. This leads to sub-optimal ranking accuracy and high computational costs. In response, we propose Cross-encoders with Joint Efficient Modeling (CROSS-JEM), a novel ranking approach that enables transformer-based models to jointly score multiple items for a query, maximizing parameter utilization. CROSS-JEM leverages (a) redundancies and token overlaps to jointly score multiple items, that are typically short-text phrases arising in search and recommendations, and (b) a novel training objective that models ranking probabilities. CROSS-JEM achieves state-of-the-art accuracy and over 4x lower ranking latency over standard cross-encoders. Our contributions are threefold: (i) we highlight the gap between the ranking application's need for scoring thousands of items per query and the limited capabilities of current cross-encoders; (ii) we introduce CROSS-JEM for joint efficient scoring of multiple items per query; and (iii) we demonstrate state-of-the-art accuracy on standard public datasets and a proprietary dataset. CROSS-JEM opens up new directions for designing tailored early-attention-based ranking models that incorporate strict production constraints such as item multiplicity and latency.
{"title":"CROSS-JEM: Accurate and Efficient Cross-encoders for Short-text Ranking Tasks","authors":"Bhawna Paliwal, Deepak Saini, Mudit Dhawan, Siddarth Asokan, Nagarajan Natarajan, Surbhi Aggarwal, Pankaj Malhotra, Jian Jiao, Manik Varma","doi":"arxiv-2409.09795","DOIUrl":"https://doi.org/arxiv-2409.09795","url":null,"abstract":"Ranking a set of items based on their relevance to a given query is a core\u0000problem in search and recommendation. Transformer-based ranking models are the\u0000state-of-the-art approaches for such tasks, but they score each query-item\u0000independently, ignoring the joint context of other relevant items. This leads\u0000to sub-optimal ranking accuracy and high computational costs. In response, we\u0000propose Cross-encoders with Joint Efficient Modeling (CROSS-JEM), a novel\u0000ranking approach that enables transformer-based models to jointly score\u0000multiple items for a query, maximizing parameter utilization. CROSS-JEM\u0000leverages (a) redundancies and token overlaps to jointly score multiple items,\u0000that are typically short-text phrases arising in search and recommendations,\u0000and (b) a novel training objective that models ranking probabilities. CROSS-JEM\u0000achieves state-of-the-art accuracy and over 4x lower ranking latency over\u0000standard cross-encoders. Our contributions are threefold: (i) we highlight the\u0000gap between the ranking application's need for scoring thousands of items per\u0000query and the limited capabilities of current cross-encoders; (ii) we introduce\u0000CROSS-JEM for joint efficient scoring of multiple items per query; and (iii) we\u0000demonstrate state-of-the-art accuracy on standard public datasets and a\u0000proprietary dataset. CROSS-JEM opens up new directions for designing tailored\u0000early-attention-based ranking models that incorporate strict production\u0000constraints such as item multiplicity and latency.","PeriodicalId":501281,"journal":{"name":"arXiv - CS - Information Retrieval","volume":"13 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142255100","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jun Yin, Zhengxin Zeng, Mingzheng Li, Hao Yan, Chaozhuo Li, Weihao Han, Jianjin Zhang, Ruochen Liu, Allen Sun, Denvy Deng, Feng Sun, Qi Zhang, Shirui Pan, Senzhang Wang
Owing to the unprecedented capability in semantic understanding and logical reasoning, the pre-trained large language models (LLMs) have shown fantastic potential in developing the next-generation recommender systems (RSs). However, the static index paradigm adopted by current methods greatly restricts the utilization of LLMs capacity for recommendation, leading to not only the insufficient alignment between semantic and collaborative knowledge, but also the neglect of high-order user-item interaction patterns. In this paper, we propose Twin-Tower Dynamic Semantic Recommender (TTDS), the first generative RS which adopts dynamic semantic index paradigm, targeting at resolving the above problems simultaneously. To be more specific, we for the first time contrive a dynamic knowledge fusion framework which integrates a twin-tower semantic token generator into the LLM-based recommender, hierarchically allocating meaningful semantic index for items and users, and accordingly predicting the semantic index of target item. Furthermore, a dual-modality variational auto-encoder is proposed to facilitate multi-grained alignment between semantic and collaborative knowledge. Eventually, a series of novel tuning tasks specially customized for capturing high-order user-item interaction patterns are proposed to take advantages of user historical behavior. Extensive experiments across three public datasets demonstrate the superiority of the proposed methodology in developing LLM-based generative RSs. The proposed TTDS recommender achieves an average improvement of 19.41% in Hit-Rate and 20.84% in NDCG metric, compared with the leading baseline methods.
{"title":"Unleash LLMs Potential for Recommendation by Coordinating Twin-Tower Dynamic Semantic Token Generator","authors":"Jun Yin, Zhengxin Zeng, Mingzheng Li, Hao Yan, Chaozhuo Li, Weihao Han, Jianjin Zhang, Ruochen Liu, Allen Sun, Denvy Deng, Feng Sun, Qi Zhang, Shirui Pan, Senzhang Wang","doi":"arxiv-2409.09253","DOIUrl":"https://doi.org/arxiv-2409.09253","url":null,"abstract":"Owing to the unprecedented capability in semantic understanding and logical\u0000reasoning, the pre-trained large language models (LLMs) have shown fantastic\u0000potential in developing the next-generation recommender systems (RSs). However,\u0000the static index paradigm adopted by current methods greatly restricts the\u0000utilization of LLMs capacity for recommendation, leading to not only the\u0000insufficient alignment between semantic and collaborative knowledge, but also\u0000the neglect of high-order user-item interaction patterns. In this paper, we\u0000propose Twin-Tower Dynamic Semantic Recommender (TTDS), the first generative RS\u0000which adopts dynamic semantic index paradigm, targeting at resolving the above\u0000problems simultaneously. To be more specific, we for the first time contrive a\u0000dynamic knowledge fusion framework which integrates a twin-tower semantic token\u0000generator into the LLM-based recommender, hierarchically allocating meaningful\u0000semantic index for items and users, and accordingly predicting the semantic\u0000index of target item. Furthermore, a dual-modality variational auto-encoder is\u0000proposed to facilitate multi-grained alignment between semantic and\u0000collaborative knowledge. Eventually, a series of novel tuning tasks specially\u0000customized for capturing high-order user-item interaction patterns are proposed\u0000to take advantages of user historical behavior. Extensive experiments across\u0000three public datasets demonstrate the superiority of the proposed methodology\u0000in developing LLM-based generative RSs. The proposed TTDS recommender achieves\u0000an average improvement of 19.41% in Hit-Rate and 20.84% in NDCG metric,\u0000compared with the leading baseline methods.","PeriodicalId":501281,"journal":{"name":"arXiv - CS - Information Retrieval","volume":"41 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142255103","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Farnoosh Javadi, Phanideep Gampa, Alyssa Woo, Xingxing Geng, Hang Zhang, Jose Sepulveda, Belhassen Bayar, Fei Wang
Streaming services have reshaped how we discover and engage with digital entertainment. Despite these advancements, effectively understanding the wide spectrum of user search queries continues to pose a significant challenge. An accurate query understanding system that can handle a variety of entities that represent different user intents is essential for delivering an enhanced user experience. We can build such a system by training a natural language understanding (NLU) model; however, obtaining high-quality labeled training data in this specialized domain is a substantial obstacle. Manual annotation is costly and impractical for capturing users' vast vocabulary variations. To address this, we introduce a novel approach that leverages large language models (LLMs) through weak supervision to automatically annotate a vast collection of user search queries. Using prompt engineering and a diverse set of LLM personas, we generate training data that matches human annotator expectations. By incorporating domain knowledge via Chain of Thought and In-Context Learning, our approach leverages the labeled data to train low-latency models optimized for real-time inference. Extensive evaluations demonstrated that our approach outperformed the baseline with an average relative gain of 113% in recall. Furthermore, our novel prompt engineering framework yields higher quality LLM-generated data to be used for weak supervision; we observed 47.60% improvement over baseline in agreement rate between LLM predictions and human annotations with respect to F1 score, weighted according to the distribution of occurrences of the search queries. Our persona selection routing mechanism further adds an additional 3.67% increase in weighted F1 score on top of our novel prompt engineering framework.
流媒体服务重塑了我们发现和参与数字娱乐的方式。尽管取得了这些进步,但有效理解用户搜索查询的广泛范围仍然是一项重大挑战。一个能够处理代表不同用户意图的各种实体的准确查询理解系统对于提供更好的使用体验至关重要。我们可以通过训练自然语言理解(NLU)模型来构建这样一个系统;然而,在这一专业领域获得高质量的标注训练数据是一个巨大的障碍。人工标注成本高昂,而且无法捕捉用户的大量词汇变化。为了解决这个问题,我们引入了一种新方法,通过弱监督利用大型语言模型(LLM)对大量用户搜索查询进行自动注释。通过使用提示工程和一系列不同的 LLM 角色,我们生成了与人类注释期望相匹配的训练数据。通过 "思维链"(Chain of Thought)和 "上下文学习"(In-Context Learning)结合领域知识,我们的方法利用标注数据来训练为实时推理而优化的低延迟模型。广泛的评估表明,我们的方法优于基线方法,平均召回率提高了 113%。此外,我们的新颖提示工程框架产生了更高质量的 LLM 生成数据,可用于弱监督;我们观察到,在 F1 分数方面,LLM 预测与人类注释之间的一致率比基线提高了 47.60%,而 F1 分数是根据搜索查询的出现率分布加权计算的。
{"title":"LLM-based Weak Supervision Framework for Query Intent Classification in Video Search","authors":"Farnoosh Javadi, Phanideep Gampa, Alyssa Woo, Xingxing Geng, Hang Zhang, Jose Sepulveda, Belhassen Bayar, Fei Wang","doi":"arxiv-2409.08931","DOIUrl":"https://doi.org/arxiv-2409.08931","url":null,"abstract":"Streaming services have reshaped how we discover and engage with digital\u0000entertainment. Despite these advancements, effectively understanding the wide\u0000spectrum of user search queries continues to pose a significant challenge. An\u0000accurate query understanding system that can handle a variety of entities that\u0000represent different user intents is essential for delivering an enhanced user\u0000experience. We can build such a system by training a natural language\u0000understanding (NLU) model; however, obtaining high-quality labeled training\u0000data in this specialized domain is a substantial obstacle. Manual annotation is\u0000costly and impractical for capturing users' vast vocabulary variations. To\u0000address this, we introduce a novel approach that leverages large language\u0000models (LLMs) through weak supervision to automatically annotate a vast\u0000collection of user search queries. Using prompt engineering and a diverse set\u0000of LLM personas, we generate training data that matches human annotator\u0000expectations. By incorporating domain knowledge via Chain of Thought and\u0000In-Context Learning, our approach leverages the labeled data to train\u0000low-latency models optimized for real-time inference. Extensive evaluations\u0000demonstrated that our approach outperformed the baseline with an average\u0000relative gain of 113% in recall. Furthermore, our novel prompt engineering\u0000framework yields higher quality LLM-generated data to be used for weak\u0000supervision; we observed 47.60% improvement over baseline in agreement rate\u0000between LLM predictions and human annotations with respect to F1 score,\u0000weighted according to the distribution of occurrences of the search queries.\u0000Our persona selection routing mechanism further adds an additional 3.67%\u0000increase in weighted F1 score on top of our novel prompt engineering framework.","PeriodicalId":501281,"journal":{"name":"arXiv - CS - Information Retrieval","volume":"67 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142255109","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Esmaeil NarimissaAustralian Taxation Office, David RaithelAustralian Taxation Office
The performance of Retrieval-Augmented Generation (RAG) systems in information retrieval is significantly influenced by the characteristics of the documents being processed. In this study, the structured nature of textbooks, the conciseness of articles, and the narrative complexity of novels are shown to require distinct retrieval strategies. A comparative evaluation of multiple document-splitting methods reveals that the Recursive Character Splitter outperforms the Token-based Splitter in preserving contextual integrity. A novel evaluation technique is introduced, utilizing an open-source model to generate a comprehensive dataset of question-and-answer pairs, simulating realistic retrieval scenarios to enhance testing efficiency and metric reliability. The evaluation employs weighted scoring metrics, including SequenceMatcher, BLEU, METEOR, and BERT Score, to assess the system's accuracy and relevance. This approach establishes a refined standard for evaluating the precision of RAG systems, with future research focusing on optimizing chunk and overlap sizes to improve retrieval accuracy and efficiency.
{"title":"Exploring Information Retrieval Landscapes: An Investigation of a Novel Evaluation Techniques and Comparative Document Splitting Methods","authors":"Esmaeil NarimissaAustralian Taxation Office, David RaithelAustralian Taxation Office","doi":"arxiv-2409.08479","DOIUrl":"https://doi.org/arxiv-2409.08479","url":null,"abstract":"The performance of Retrieval-Augmented Generation (RAG) systems in\u0000information retrieval is significantly influenced by the characteristics of the\u0000documents being processed. In this study, the structured nature of textbooks,\u0000the conciseness of articles, and the narrative complexity of novels are shown\u0000to require distinct retrieval strategies. A comparative evaluation of multiple\u0000document-splitting methods reveals that the Recursive Character Splitter\u0000outperforms the Token-based Splitter in preserving contextual integrity. A\u0000novel evaluation technique is introduced, utilizing an open-source model to\u0000generate a comprehensive dataset of question-and-answer pairs, simulating\u0000realistic retrieval scenarios to enhance testing efficiency and metric\u0000reliability. The evaluation employs weighted scoring metrics, including\u0000SequenceMatcher, BLEU, METEOR, and BERT Score, to assess the system's accuracy\u0000and relevance. This approach establishes a refined standard for evaluating the\u0000precision of RAG systems, with future research focusing on optimizing chunk and\u0000overlap sizes to improve retrieval accuracy and efficiency.","PeriodicalId":501281,"journal":{"name":"arXiv - CS - Information Retrieval","volume":"21 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142269086","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Scholarly communication is a rapid growing field containing a wealth of knowledge. However, due to its unstructured and document format, it is challenging to extract useful information from them through conventional document retrieval methods. Scholarly knowledge graphs solve this problem, by representing the documents in a semantic network, providing, hidden insights, summaries and ease of accessibility through queries. Naturally, question answering for scholarly graphs expands the accessibility to a wider audience. But some of the knowledge in this domain is still presented as unstructured text, thus requiring a hybrid solution for question answering systems. In this paper, we present a two step solution using open source Large Language Model(LLM): Llama3.1 for Scholarly-QALD dataset. Firstly, we extract the context pertaining to the question from different structured and unstructured data sources: DBLP, SemOpenAlex knowledge graphs and Wikipedia text. Secondly, we implement prompt engineering to improve the information retrieval performance of the LLM. Our approach achieved an F1 score of 40% and also observed some anomalous responses from the LLM, that are discussed in the final part of the paper.
{"title":"Contri(e)ve: Context + Retrieve for Scholarly Question Answering","authors":"Kanchan Shivashankar, Nadine Steinmetz","doi":"arxiv-2409.09010","DOIUrl":"https://doi.org/arxiv-2409.09010","url":null,"abstract":"Scholarly communication is a rapid growing field containing a wealth of\u0000knowledge. However, due to its unstructured and document format, it is\u0000challenging to extract useful information from them through conventional\u0000document retrieval methods. Scholarly knowledge graphs solve this problem, by\u0000representing the documents in a semantic network, providing, hidden insights,\u0000summaries and ease of accessibility through queries. Naturally, question\u0000answering for scholarly graphs expands the accessibility to a wider audience.\u0000But some of the knowledge in this domain is still presented as unstructured\u0000text, thus requiring a hybrid solution for question answering systems. In this\u0000paper, we present a two step solution using open source Large Language\u0000Model(LLM): Llama3.1 for Scholarly-QALD dataset. Firstly, we extract the\u0000context pertaining to the question from different structured and unstructured\u0000data sources: DBLP, SemOpenAlex knowledge graphs and Wikipedia text. Secondly,\u0000we implement prompt engineering to improve the information retrieval\u0000performance of the LLM. Our approach achieved an F1 score of 40% and also\u0000observed some anomalous responses from the LLM, that are discussed in the final\u0000part of the paper.","PeriodicalId":501281,"journal":{"name":"arXiv - CS - Information Retrieval","volume":"47 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142269091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Recommender Systems (RS) play a pivotal role in boosting user satisfaction by providing personalized product suggestions in domains such as e-commerce and entertainment. This study examines the integration of multimodal data text and audio into large language models (LLMs) with the aim of enhancing recommendation performance. Traditional text and audio recommenders encounter limitations such as the cold-start problem, and recent advancements in LLMs, while promising, are computationally expensive. To address these issues, Low-Rank Adaptation (LoRA) is introduced, which enhances efficiency without compromising performance. The ATFLRec framework is proposed to integrate audio and text modalities into a multimodal recommendation system, utilizing various LoRA configurations and modality fusion techniques. Results indicate that ATFLRec outperforms baseline models, including traditional and graph neural network-based approaches, achieving higher AUC scores. Furthermore, separate fine-tuning of audio and text data with distinct LoRA modules yields optimal performance, with different pooling methods and Mel filter bank numbers significantly impacting performance. This research offers valuable insights into optimizing multimodal recommender systems and advancing the integration of diverse data modalities in LLMs.
{"title":"ATFLRec: A Multimodal Recommender System with Audio-Text Fusion and Low-Rank Adaptation via Instruction-Tuned Large Language Model","authors":"Zezheng Qin","doi":"arxiv-2409.08543","DOIUrl":"https://doi.org/arxiv-2409.08543","url":null,"abstract":"Recommender Systems (RS) play a pivotal role in boosting user satisfaction by\u0000providing personalized product suggestions in domains such as e-commerce and\u0000entertainment. This study examines the integration of multimodal data text and\u0000audio into large language models (LLMs) with the aim of enhancing\u0000recommendation performance. Traditional text and audio recommenders encounter\u0000limitations such as the cold-start problem, and recent advancements in LLMs,\u0000while promising, are computationally expensive. To address these issues,\u0000Low-Rank Adaptation (LoRA) is introduced, which enhances efficiency without\u0000compromising performance. The ATFLRec framework is proposed to integrate audio\u0000and text modalities into a multimodal recommendation system, utilizing various\u0000LoRA configurations and modality fusion techniques. Results indicate that\u0000ATFLRec outperforms baseline models, including traditional and graph neural\u0000network-based approaches, achieving higher AUC scores. Furthermore, separate\u0000fine-tuning of audio and text data with distinct LoRA modules yields optimal\u0000performance, with different pooling methods and Mel filter bank numbers\u0000significantly impacting performance. This research offers valuable insights\u0000into optimizing multimodal recommender systems and advancing the integration of\u0000diverse data modalities in LLMs.","PeriodicalId":501281,"journal":{"name":"arXiv - CS - Information Retrieval","volume":"43 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142255112","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}