arXiv - CS - Information Retrieval最新文献_第4页

Causal Discovery in Recommender Systems: Example and Discussion 推荐系统中的因果发现：实例与讨论

arXiv - CS - Information Retrieval

Pub Date : 2024-09-16 DOI: arxiv-2409.10271

Emanuele Cavenaghi, Fabio Stella, Markus Zanker

Causality is receiving increasing attention by the artificial intelligenceand machine learning communities. This paper gives an example of modelling arecommender system problem using causal graphs. Specifically, we approached thecausal discovery task to learn a causal graph by combining observational datafrom an open-source dataset with prior knowledge. The resulting causal graphshows that only a few variables effectively influence the analysed feedbacksignals. This contrasts with the recent trend in the machine learning communityto include more and more variables in massive models, such as neural networks.

因果关系越来越受到人工智能和机器学习界的关注。本文举例说明了如何利用因果图来模拟推荐系统问题。具体来说，我们通过将来自开源数据集的观测数据与先验知识相结合来完成因果发现任务，从而学习因果图。由此产生的因果图显示，只有少数几个变量能有效影响所分析的反馈信号。这与机器学习界最近在神经网络等大规模模型中加入越来越多变量的趋势形成了鲜明对比。

引用次数: 0

AlpaPICO: Extraction of PICO Frames from Clinical Trial Documents Using LLMs AlpaPICO：使用 LLM 从临床试验文档中提取 PICO 框架

arXiv - CS - Information Retrieval

Pub Date : 2024-09-15 DOI: arxiv-2409.09704

Madhusudan Ghosh, Shrimon Mukherjee, Asmit Ganguly, Partha Basuchowdhuri, Sudip Kumar Naskar, Debasis Ganguly

In recent years, there has been a surge in the publication of clinical trialreports, making it challenging to conduct systematic reviews. Automaticallyextracting Population, Intervention, Comparator, and Outcome (PICO) fromclinical trial studies can alleviate the traditionally time-consuming processof manually scrutinizing systematic reviews. Existing approaches of PICO frameextraction involves supervised approach that relies on the existence ofmanually annotated data points in the form of BIO label tagging. Recentapproaches, such as In-Context Learning (ICL), which has been shown to beeffective for a number of downstream NLP tasks, require the use of labeledexamples. In this work, we adopt ICL strategy by employing the pretrainedknowledge of Large Language Models (LLMs), gathered during the pretrainingphase of an LLM, to automatically extract the PICO-related terminologies fromclinical trial documents in unsupervised set up to bypass the availability oflarge number of annotated data instances. Additionally, to showcase the highesteffectiveness of LLM in oracle scenario where large number of annotated samplesare available, we adopt the instruction tuning strategy by employing Low RankAdaptation (LORA) to conduct the training of gigantic model in low resourceenvironment for the PICO frame extraction task. Our empirical results show thatour proposed ICL-based framework produces comparable results on all the versionof EBM-NLP datasets and the proposed instruction tuned version of our frameworkproduces state-of-the-art results on all the different EBM-NLP datasets. Ourproject is available at url{https://github.com/shrimonmuke0202/AlpaPICO.git}.

近年来，临床试验报告的发表量激增，给系统综述带来了挑战。从临床试验研究中自动提取人群、干预措施、比较者和结果（PICO）可以减轻传统上耗时的人工审查系统综述的过程。现有的 PICO 框架提取方法涉及有监督的方法，这种方法依赖于以 BIO 标签标记形式存在的人工注释数据点。最新的方法，如 "上下文学习"（In-Context Learning，简称 ICL），已被证明对许多下游 NLP 任务有效，但需要使用标记过的示例。在这项工作中，我们采用了 ICL 策略，利用在 LLM 预训练阶段收集到的大语言模型（LLM）的预训练知识，在无监督设置下从临床试验文档中自动提取 PICO 相关术语，从而绕过了大量注释数据实例的可用性问题。此外，为了展示 LLM 在有大量注释样本的 oracle 场景中的最高效率，我们采用了低等级适应（LORA）的指令调整策略，在低资源环境中针对 PICO 框架提取任务进行巨型模型训练。我们的实证结果表明，我们提出的基于ICL的框架在所有版本的EBM-NLP数据集上都产生了相似的结果，而我们框架的指令调整版本在所有不同的EBM-NLP数据集上都产生了最先进的结果。我们的项目可在（url{https://github.com/shrimonmuke0202/AlpaPICO.git}.

{"title":"AlpaPICO: Extraction of PICO Frames from Clinical Trial Documents Using LLMs","authors":"Madhusudan Ghosh, Shrimon Mukherjee, Asmit Ganguly, Partha Basuchowdhuri, Sudip Kumar Naskar, Debasis Ganguly","doi":"arxiv-2409.09704","DOIUrl":"https://doi.org/arxiv-2409.09704","url":null,"abstract":"In recent years, there has been a surge in the publication of clinical trial\u0000reports, making it challenging to conduct systematic reviews. Automatically\u0000extracting Population, Intervention, Comparator, and Outcome (PICO) from\u0000clinical trial studies can alleviate the traditionally time-consuming process\u0000of manually scrutinizing systematic reviews. Existing approaches of PICO frame\u0000extraction involves supervised approach that relies on the existence of\u0000manually annotated data points in the form of BIO label tagging. Recent\u0000approaches, such as In-Context Learning (ICL), which has been shown to be\u0000effective for a number of downstream NLP tasks, require the use of labeled\u0000examples. In this work, we adopt ICL strategy by employing the pretrained\u0000knowledge of Large Language Models (LLMs), gathered during the pretraining\u0000phase of an LLM, to automatically extract the PICO-related terminologies from\u0000clinical trial documents in unsupervised set up to bypass the availability of\u0000large number of annotated data instances. Additionally, to showcase the highest\u0000effectiveness of LLM in oracle scenario where large number of annotated samples\u0000are available, we adopt the instruction tuning strategy by employing Low Rank\u0000Adaptation (LORA) to conduct the training of gigantic model in low resource\u0000environment for the PICO frame extraction task. Our empirical results show that\u0000our proposed ICL-based framework produces comparable results on all the version\u0000of EBM-NLP datasets and the proposed instruction tuned version of our framework\u0000produces state-of-the-art results on all the different EBM-NLP datasets. Our\u0000project is available at url{https://github.com/shrimonmuke0202/AlpaPICO.git}.","PeriodicalId":501281,"journal":{"name":"arXiv - CS - Information Retrieval","volume":"18 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142255105","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Measuring Recency Bias In Sequential Recommendation Systems 测量序列推荐系统中的重复偏差

arXiv - CS - Information Retrieval

Pub Date : 2024-09-15 DOI: arxiv-2409.09722

Jeonglyul Oh, Sungzoon Cho

Recency bias in a sequential recommendation system refers to the overly highemphasis placed on recent items within a user session. This bias can diminishthe serendipity of recommendations and hinder the system's ability to captureusers' long-term interests, leading to user disengagement. We propose a simpleyet effective novel metric specifically designed to quantify recency bias. Ourfindings also demonstrate that high recency bias measured in our proposedmetric adversely impacts recommendation performance too, and mitigating itresults in improved recommendation performances across all models evaluated inour experiments, thus highlighting the importance of measuring recency bias.

顺序推荐系统中的 "最近偏差 "是指在用户会话中过分强调最近的项目。这种偏差会降低推荐的偶然性，阻碍系统捕捉用户长期兴趣的能力，从而导致用户脱离。我们提出了一种简单而有效的新指标，专门用于量化重现偏差。我们的研究结果还表明，我们提出的指标所衡量的高重复性偏差也会对推荐性能产生不利影响，而减轻这种偏差则会提高我们实验中评估的所有模型的推荐性能，从而突出了衡量重复性偏差的重要性。

引用次数: 0

Language Models and Retrieval Augmented Generation for Automated Structured Data Extraction from Diagnostic Reports 从诊断报告中自动提取结构化数据的语言模型和检索增强生成技术

arXiv - CS - Information Retrieval

Pub Date : 2024-09-15 DOI: arxiv-2409.10576

Mohamed Sobhi Jabal, Pranav Warman, Jikai Zhang, Kartikeye Gupta, Ayush Jain, Maciej Mazurowski, Walter Wiggins, Kirti Magudia, Evan Calabrese

Purpose: To develop and evaluate an automated system for extractingstructured clinical information from unstructured radiology and pathologyreports using open-weights large language models (LMs) and retrieval augmentedgeneration (RAG), and to assess the effects of model configuration variables onextraction performance. Methods and Materials: The study utilized two datasets:7,294 radiology reports annotated for Brain Tumor Reporting and Data System(BT-RADS) scores and 2,154 pathology reports annotated for isocitratedehydrogenase (IDH) mutation status. An automated pipeline was developed tobenchmark the performance of various LMs and RAG configurations. The impact ofmodel size, quantization, prompting strategies, output formatting, andinference parameters was systematically evaluated. Results: The best performingmodels achieved over 98% accuracy in extracting BT-RADS scores from radiologyreports and over 90% for IDH mutation status extraction from pathology reports.The top model being medical fine-tuned llama3. Larger, newer, and domainfine-tuned models consistently outperformed older and smaller models. Modelquantization had minimal impact on performance. Few-shot promptingsignificantly improved accuracy. RAG improved performance for complex pathologyreports but not for shorter radiology reports. Conclusions: Open LMsdemonstrate significant potential for automated extraction of structuredclinical data from unstructured clinical reports with local privacy-preservingapplication. Careful model selection, prompt engineering, and semi-automatedoptimization using annotated data are critical for optimal performance. Theseapproaches could be reliable enough for practical use in research workflows,highlighting the potential for human-machine collaboration in healthcare dataextraction.

目的：使用开放权重大语言模型（LM）和检索增强生成（RAG）开发和评估从非结构化放射学和病理学报告中提取结构化临床信息的自动化系统，并评估模型配置变量对提取性能的影响。方法和材料：该研究使用了两个数据集：7,294 份注释了脑肿瘤报告和数据系统（BT-RADS）评分的放射学报告和 2,154 份注释了异柠檬酸氢酶（IDH）突变状态的病理学报告。我们开发了一个自动化管道，对各种 LM 和 RAG 配置的性能进行enchmark。系统评估了模型大小、量化、提示策略、输出格式和推断参数的影响。结果：表现最好的模型从放射报告中提取 BT-RADS 评分的准确率超过 98%，从病理报告中提取 IDH 突变状态的准确率超过 90%。较大、较新和经过领域微调的模型始终优于较旧和较小的模型。模型量化对性能的影响微乎其微。少量提示显著提高了准确性。RAG 提高了复杂病理报告的性能，但对较短的放射学报告没有影响。结论：开放式 LM 展示了从非结构化临床报告中自动提取结构化临床数据的巨大潜力，同时还能在本地应用中保护隐私。使用注释数据进行谨慎的模型选择、及时的工程设计和半自动优化对于实现最佳性能至关重要。这些方法足够可靠，可用于研究工作流程的实际应用，凸显了人机协作在医疗数据提取方面的潜力。

{"title":"Language Models and Retrieval Augmented Generation for Automated Structured Data Extraction from Diagnostic Reports","authors":"Mohamed Sobhi Jabal, Pranav Warman, Jikai Zhang, Kartikeye Gupta, Ayush Jain, Maciej Mazurowski, Walter Wiggins, Kirti Magudia, Evan Calabrese","doi":"arxiv-2409.10576","DOIUrl":"https://doi.org/arxiv-2409.10576","url":null,"abstract":"Purpose: To develop and evaluate an automated system for extracting\u0000structured clinical information from unstructured radiology and pathology\u0000reports using open-weights large language models (LMs) and retrieval augmented\u0000generation (RAG), and to assess the effects of model configuration variables on\u0000extraction performance. Methods and Materials: The study utilized two datasets:\u00007,294 radiology reports annotated for Brain Tumor Reporting and Data System\u0000(BT-RADS) scores and 2,154 pathology reports annotated for isocitrate\u0000dehydrogenase (IDH) mutation status. An automated pipeline was developed to\u0000benchmark the performance of various LMs and RAG configurations. The impact of\u0000model size, quantization, prompting strategies, output formatting, and\u0000inference parameters was systematically evaluated. Results: The best performing\u0000models achieved over 98% accuracy in extracting BT-RADS scores from radiology\u0000reports and over 90% for IDH mutation status extraction from pathology reports.\u0000The top model being medical fine-tuned llama3. Larger, newer, and domain\u0000fine-tuned models consistently outperformed older and smaller models. Model\u0000quantization had minimal impact on performance. Few-shot prompting\u0000significantly improved accuracy. RAG improved performance for complex pathology\u0000reports but not for shorter radiology reports. Conclusions: Open LMs\u0000demonstrate significant potential for automated extraction of structured\u0000clinical data from unstructured clinical reports with local privacy-preserving\u0000application. Careful model selection, prompt engineering, and semi-automated\u0000optimization using annotated data are critical for optimal performance. These\u0000approaches could be reliable enough for practical use in research workflows,\u0000highlighting the potential for human-machine collaboration in healthcare data\u0000extraction.","PeriodicalId":501281,"journal":{"name":"arXiv - CS - Information Retrieval","volume":"41 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142255416","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

CROSS-JEM: Accurate and Efficient Cross-encoders for Short-text Ranking Tasks CROSS-JEM：用于短文排序任务的准确高效交叉编码器

arXiv - CS - Information Retrieval

Pub Date : 2024-09-15 DOI: arxiv-2409.09795

Bhawna Paliwal, Deepak Saini, Mudit Dhawan, Siddarth Asokan, Nagarajan Natarajan, Surbhi Aggarwal, Pankaj Malhotra, Jian Jiao, Manik Varma

Ranking a set of items based on their relevance to a given query is a coreproblem in search and recommendation. Transformer-based ranking models are thestate-of-the-art approaches for such tasks, but they score each query-itemindependently, ignoring the joint context of other relevant items. This leadsto sub-optimal ranking accuracy and high computational costs. In response, wepropose Cross-encoders with Joint Efficient Modeling (CROSS-JEM), a novelranking approach that enables transformer-based models to jointly scoremultiple items for a query, maximizing parameter utilization. CROSS-JEMleverages (a) redundancies and token overlaps to jointly score multiple items,that are typically short-text phrases arising in search and recommendations,and (b) a novel training objective that models ranking probabilities. CROSS-JEMachieves state-of-the-art accuracy and over 4x lower ranking latency overstandard cross-encoders. Our contributions are threefold: (i) we highlight thegap between the ranking application's need for scoring thousands of items perquery and the limited capabilities of current cross-encoders; (ii) we introduceCROSS-JEM for joint efficient scoring of multiple items per query; and (iii) wedemonstrate state-of-the-art accuracy on standard public datasets and aproprietary dataset. CROSS-JEM opens up new directions for designing tailoredearly-attention-based ranking models that incorporate strict productionconstraints such as item multiplicity and latency.

根据项目与给定查询的相关性对项目集进行排序是搜索和推荐中的一个核心问题。基于变换器的排名模型是此类任务的最先进方法，但它们对每个查询项的评分都是独立的，忽略了其他相关项的联合上下文。这导致了次优的排名准确性和高昂的计算成本。为此，我们提出了联合高效建模交叉编码器（Cross-encoders with Joint Efficient Modeling，CROSS-JEM），这是一种新颖的排序方法，它使基于变换器的模型能够对查询的多个项目进行联合评分，从而最大限度地提高参数利用率。CROSS-JEM 利用（a）冗余和标记重叠对多个项目（通常是搜索和推荐中出现的短文词组）进行联合评分，以及（b）对排名概率进行建模的新型训练目标。CROSS-JEM 达到了目前最先进的准确度，并且比标准交叉编码器的排序延迟时间低 4 倍以上。我们的贡献有三个方面：(i) 我们强调了排名应用对每次查询成千上万条项目的评分需求与当前交叉编码器有限能力之间的差距；(ii) 我们引入了 CROSS-JEM，用于对每次查询的多个项目进行联合高效评分；(iii) 我们在标准公共数据集和专有数据集上展示了最先进的准确性。CROSS-JEM 为设计基于早期注意力的定制排名模型开辟了新的方向，这些排名模型包含严格的生产约束条件，如项目多重性和延迟。

{"title":"CROSS-JEM: Accurate and Efficient Cross-encoders for Short-text Ranking Tasks","authors":"Bhawna Paliwal, Deepak Saini, Mudit Dhawan, Siddarth Asokan, Nagarajan Natarajan, Surbhi Aggarwal, Pankaj Malhotra, Jian Jiao, Manik Varma","doi":"arxiv-2409.09795","DOIUrl":"https://doi.org/arxiv-2409.09795","url":null,"abstract":"Ranking a set of items based on their relevance to a given query is a core\u0000problem in search and recommendation. Transformer-based ranking models are the\u0000state-of-the-art approaches for such tasks, but they score each query-item\u0000independently, ignoring the joint context of other relevant items. This leads\u0000to sub-optimal ranking accuracy and high computational costs. In response, we\u0000propose Cross-encoders with Joint Efficient Modeling (CROSS-JEM), a novel\u0000ranking approach that enables transformer-based models to jointly score\u0000multiple items for a query, maximizing parameter utilization. CROSS-JEM\u0000leverages (a) redundancies and token overlaps to jointly score multiple items,\u0000that are typically short-text phrases arising in search and recommendations,\u0000and (b) a novel training objective that models ranking probabilities. CROSS-JEM\u0000achieves state-of-the-art accuracy and over 4x lower ranking latency over\u0000standard cross-encoders. Our contributions are threefold: (i) we highlight the\u0000gap between the ranking application's need for scoring thousands of items per\u0000query and the limited capabilities of current cross-encoders; (ii) we introduce\u0000CROSS-JEM for joint efficient scoring of multiple items per query; and (iii) we\u0000demonstrate state-of-the-art accuracy on standard public datasets and a\u0000proprietary dataset. CROSS-JEM opens up new directions for designing tailored\u0000early-attention-based ranking models that incorporate strict production\u0000constraints such as item multiplicity and latency.","PeriodicalId":501281,"journal":{"name":"arXiv - CS - Information Retrieval","volume":"13 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142255100","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Unleash LLMs Potential for Recommendation by Coordinating Twin-Tower Dynamic Semantic Token Generator 通过协调双塔动态语义令牌生成器释放 LLM 的推荐潜力

arXiv - CS - Information Retrieval

Pub Date : 2024-09-14 DOI: arxiv-2409.09253

Jun Yin, Zhengxin Zeng, Mingzheng Li, Hao Yan, Chaozhuo Li, Weihao Han, Jianjin Zhang, Ruochen Liu, Allen Sun, Denvy Deng, Feng Sun, Qi Zhang, Shirui Pan, Senzhang Wang

Owing to the unprecedented capability in semantic understanding and logicalreasoning, the pre-trained large language models (LLMs) have shown fantasticpotential in developing the next-generation recommender systems (RSs). However,the static index paradigm adopted by current methods greatly restricts theutilization of LLMs capacity for recommendation, leading to not only theinsufficient alignment between semantic and collaborative knowledge, but alsothe neglect of high-order user-item interaction patterns. In this paper, wepropose Twin-Tower Dynamic Semantic Recommender (TTDS), the first generative RSwhich adopts dynamic semantic index paradigm, targeting at resolving the aboveproblems simultaneously. To be more specific, we for the first time contrive adynamic knowledge fusion framework which integrates a twin-tower semantic tokengenerator into the LLM-based recommender, hierarchically allocating meaningfulsemantic index for items and users, and accordingly predicting the semanticindex of target item. Furthermore, a dual-modality variational auto-encoder isproposed to facilitate multi-grained alignment between semantic andcollaborative knowledge. Eventually, a series of novel tuning tasks speciallycustomized for capturing high-order user-item interaction patterns are proposedto take advantages of user historical behavior. Extensive experiments acrossthree public datasets demonstrate the superiority of the proposed methodologyin developing LLM-based generative RSs. The proposed TTDS recommender achievesan average improvement of 19.41% in Hit-Rate and 20.84% in NDCG metric,compared with the leading baseline methods.

由于在语义理解和逻辑推理方面具有前所未有的能力，预训练的大型语言模型（LLMs）在开发下一代推荐系统（RSs）方面展现出了巨大的潜力。然而，当前方法所采用的静态索引范式极大地限制了 LLMs 在推荐方面的能力发挥，不仅导致语义知识与协作知识之间的匹配不足，而且忽视了用户与项目之间的高阶交互模式。在本文中，我们提出了双塔动态语义推荐器（TTDS），这是第一个采用动态语义索引范式的生成式 RS，旨在同时解决上述问题。具体来说，我们首次提出了一个动态知识融合框架，将双塔语义标记生成器集成到基于 LLM 的推荐器中，分层为项目和用户分配有意义的语义索引，并据此预测目标项目的语义索引。此外，还提出了一种双模态变异自动编码器，以促进语义知识和协作知识之间的多粒度对齐。最后，还提出了一系列专门用于捕捉高阶用户-物品交互模式的新颖调整任务，以利用用户的历史行为。在三个公共数据集上进行的广泛实验证明了所提出的方法在开发基于 LLM 的生成式 RS 中的优越性。与领先的基线方法相比，所提出的 TTDS 推荐器在命中率和 NDCG 指标上分别平均提高了 19.41% 和 20.84%。

{"title":"Unleash LLMs Potential for Recommendation by Coordinating Twin-Tower Dynamic Semantic Token Generator","authors":"Jun Yin, Zhengxin Zeng, Mingzheng Li, Hao Yan, Chaozhuo Li, Weihao Han, Jianjin Zhang, Ruochen Liu, Allen Sun, Denvy Deng, Feng Sun, Qi Zhang, Shirui Pan, Senzhang Wang","doi":"arxiv-2409.09253","DOIUrl":"https://doi.org/arxiv-2409.09253","url":null,"abstract":"Owing to the unprecedented capability in semantic understanding and logical\u0000reasoning, the pre-trained large language models (LLMs) have shown fantastic\u0000potential in developing the next-generation recommender systems (RSs). However,\u0000the static index paradigm adopted by current methods greatly restricts the\u0000utilization of LLMs capacity for recommendation, leading to not only the\u0000insufficient alignment between semantic and collaborative knowledge, but also\u0000the neglect of high-order user-item interaction patterns. In this paper, we\u0000propose Twin-Tower Dynamic Semantic Recommender (TTDS), the first generative RS\u0000which adopts dynamic semantic index paradigm, targeting at resolving the above\u0000problems simultaneously. To be more specific, we for the first time contrive a\u0000dynamic knowledge fusion framework which integrates a twin-tower semantic token\u0000generator into the LLM-based recommender, hierarchically allocating meaningful\u0000semantic index for items and users, and accordingly predicting the semantic\u0000index of target item. Furthermore, a dual-modality variational auto-encoder is\u0000proposed to facilitate multi-grained alignment between semantic and\u0000collaborative knowledge. Eventually, a series of novel tuning tasks specially\u0000customized for capturing high-order user-item interaction patterns are proposed\u0000to take advantages of user historical behavior. Extensive experiments across\u0000three public datasets demonstrate the superiority of the proposed methodology\u0000in developing LLM-based generative RSs. The proposed TTDS recommender achieves\u0000an average improvement of 19.41% in Hit-Rate and 20.84% in NDCG metric,\u0000compared with the leading baseline methods.","PeriodicalId":501281,"journal":{"name":"arXiv - CS - Information Retrieval","volume":"41 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142255103","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

LLM-based Weak Supervision Framework for Query Intent Classification in Video Search 基于 LLM 的视频搜索查询意图分类弱监督框架

arXiv - CS - Information Retrieval

Pub Date : 2024-09-13 DOI: arxiv-2409.08931

Farnoosh Javadi, Phanideep Gampa, Alyssa Woo, Xingxing Geng, Hang Zhang, Jose Sepulveda, Belhassen Bayar, Fei Wang

Streaming services have reshaped how we discover and engage with digitalentertainment. Despite these advancements, effectively understanding the widespectrum of user search queries continues to pose a significant challenge. Anaccurate query understanding system that can handle a variety of entities thatrepresent different user intents is essential for delivering an enhanced userexperience. We can build such a system by training a natural languageunderstanding (NLU) model; however, obtaining high-quality labeled trainingdata in this specialized domain is a substantial obstacle. Manual annotation iscostly and impractical for capturing users' vast vocabulary variations. Toaddress this, we introduce a novel approach that leverages large languagemodels (LLMs) through weak supervision to automatically annotate a vastcollection of user search queries. Using prompt engineering and a diverse setof LLM personas, we generate training data that matches human annotatorexpectations. By incorporating domain knowledge via Chain of Thought andIn-Context Learning, our approach leverages the labeled data to trainlow-latency models optimized for real-time inference. Extensive evaluationsdemonstrated that our approach outperformed the baseline with an averagerelative gain of 113% in recall. Furthermore, our novel prompt engineeringframework yields higher quality LLM-generated data to be used for weaksupervision; we observed 47.60% improvement over baseline in agreement ratebetween LLM predictions and human annotations with respect to F1 score,weighted according to the distribution of occurrences of the search queries.Our persona selection routing mechanism further adds an additional 3.67%increase in weighted F1 score on top of our novel prompt engineering framework.

流媒体服务重塑了我们发现和参与数字娱乐的方式。尽管取得了这些进步，但有效理解用户搜索查询的广泛范围仍然是一项重大挑战。一个能够处理代表不同用户意图的各种实体的准确查询理解系统对于提供更好的使用体验至关重要。我们可以通过训练自然语言理解（NLU）模型来构建这样一个系统；然而，在这一专业领域获得高质量的标注训练数据是一个巨大的障碍。人工标注成本高昂，而且无法捕捉用户的大量词汇变化。为了解决这个问题，我们引入了一种新方法，通过弱监督利用大型语言模型（LLM）对大量用户搜索查询进行自动注释。通过使用提示工程和一系列不同的 LLM 角色，我们生成了与人类注释期望相匹配的训练数据。通过 "思维链"（Chain of Thought）和 "上下文学习"（In-Context Learning）结合领域知识，我们的方法利用标注数据来训练为实时推理而优化的低延迟模型。广泛的评估表明，我们的方法优于基线方法，平均召回率提高了 113%。此外，我们的新颖提示工程框架产生了更高质量的 LLM 生成数据，可用于弱监督；我们观察到，在 F1 分数方面，LLM 预测与人类注释之间的一致率比基线提高了 47.60%，而 F1 分数是根据搜索查询的出现率分布加权计算的。

{"title":"LLM-based Weak Supervision Framework for Query Intent Classification in Video Search","authors":"Farnoosh Javadi, Phanideep Gampa, Alyssa Woo, Xingxing Geng, Hang Zhang, Jose Sepulveda, Belhassen Bayar, Fei Wang","doi":"arxiv-2409.08931","DOIUrl":"https://doi.org/arxiv-2409.08931","url":null,"abstract":"Streaming services have reshaped how we discover and engage with digital\u0000entertainment. Despite these advancements, effectively understanding the wide\u0000spectrum of user search queries continues to pose a significant challenge. An\u0000accurate query understanding system that can handle a variety of entities that\u0000represent different user intents is essential for delivering an enhanced user\u0000experience. We can build such a system by training a natural language\u0000understanding (NLU) model; however, obtaining high-quality labeled training\u0000data in this specialized domain is a substantial obstacle. Manual annotation is\u0000costly and impractical for capturing users' vast vocabulary variations. To\u0000address this, we introduce a novel approach that leverages large language\u0000models (LLMs) through weak supervision to automatically annotate a vast\u0000collection of user search queries. Using prompt engineering and a diverse set\u0000of LLM personas, we generate training data that matches human annotator\u0000expectations. By incorporating domain knowledge via Chain of Thought and\u0000In-Context Learning, our approach leverages the labeled data to train\u0000low-latency models optimized for real-time inference. Extensive evaluations\u0000demonstrated that our approach outperformed the baseline with an average\u0000relative gain of 113% in recall. Furthermore, our novel prompt engineering\u0000framework yields higher quality LLM-generated data to be used for weak\u0000supervision; we observed 47.60% improvement over baseline in agreement rate\u0000between LLM predictions and human annotations with respect to F1 score,\u0000weighted according to the distribution of occurrences of the search queries.\u0000Our persona selection routing mechanism further adds an additional 3.67%\u0000increase in weighted F1 score on top of our novel prompt engineering framework.","PeriodicalId":501281,"journal":{"name":"arXiv - CS - Information Retrieval","volume":"67 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142255109","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Exploring Information Retrieval Landscapes: An Investigation of a Novel Evaluation Techniques and Comparative Document Splitting Methods 探索信息检索景观：新型评估技术和比较文档分割方法研究

arXiv - CS - Information Retrieval

Pub Date : 2024-09-13 DOI: arxiv-2409.08479

Esmaeil NarimissaAustralian Taxation Office, David RaithelAustralian Taxation Office

The performance of Retrieval-Augmented Generation (RAG) systems ininformation retrieval is significantly influenced by the characteristics of thedocuments being processed. In this study, the structured nature of textbooks,the conciseness of articles, and the narrative complexity of novels are shownto require distinct retrieval strategies. A comparative evaluation of multipledocument-splitting methods reveals that the Recursive Character Splitteroutperforms the Token-based Splitter in preserving contextual integrity. Anovel evaluation technique is introduced, utilizing an open-source model togenerate a comprehensive dataset of question-and-answer pairs, simulatingrealistic retrieval scenarios to enhance testing efficiency and metricreliability. The evaluation employs weighted scoring metrics, includingSequenceMatcher, BLEU, METEOR, and BERT Score, to assess the system's accuracyand relevance. This approach establishes a refined standard for evaluating theprecision of RAG systems, with future research focusing on optimizing chunk andoverlap sizes to improve retrieval accuracy and efficiency.

检索增强生成（RAG）系统在信息检索中的性能受到被处理文档特征的显著影响。在这项研究中，教科书的结构性、文章的简洁性和小说的叙事复杂性被证明需要不同的检索策略。对多文档分割方法的比较评估表明，递归字符分割器在保持上下文完整性方面优于基于标记的分割器。介绍了一种新颖的评估技术，它利用一个开源模型生成一个全面的问答对数据集，模拟现实检索场景，以提高测试效率和指标可靠性。评估采用了加权评分标准，包括序列捕手（SequenceMatcher）、BLEU、METEOR 和 BERT Score，以评估系统的准确性和相关性。这种方法为评估 RAG 系统的准确性建立了一个完善的标准，未来的研究重点是优化块和重叠的大小，以提高检索的准确性和效率。

{"title":"Exploring Information Retrieval Landscapes: An Investigation of a Novel Evaluation Techniques and Comparative Document Splitting Methods","authors":"Esmaeil NarimissaAustralian Taxation Office, David RaithelAustralian Taxation Office","doi":"arxiv-2409.08479","DOIUrl":"https://doi.org/arxiv-2409.08479","url":null,"abstract":"The performance of Retrieval-Augmented Generation (RAG) systems in\u0000information retrieval is significantly influenced by the characteristics of the\u0000documents being processed. In this study, the structured nature of textbooks,\u0000the conciseness of articles, and the narrative complexity of novels are shown\u0000to require distinct retrieval strategies. A comparative evaluation of multiple\u0000document-splitting methods reveals that the Recursive Character Splitter\u0000outperforms the Token-based Splitter in preserving contextual integrity. A\u0000novel evaluation technique is introduced, utilizing an open-source model to\u0000generate a comprehensive dataset of question-and-answer pairs, simulating\u0000realistic retrieval scenarios to enhance testing efficiency and metric\u0000reliability. The evaluation employs weighted scoring metrics, including\u0000SequenceMatcher, BLEU, METEOR, and BERT Score, to assess the system's accuracy\u0000and relevance. This approach establishes a refined standard for evaluating the\u0000precision of RAG systems, with future research focusing on optimizing chunk and\u0000overlap sizes to improve retrieval accuracy and efficiency.","PeriodicalId":501281,"journal":{"name":"arXiv - CS - Information Retrieval","volume":"21 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142269086","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Contri(e)ve: Context + Retrieve for Scholarly Question Answering 贡献：学术问题解答的上下文+检索

arXiv - CS - Information Retrieval

Pub Date : 2024-09-13 DOI: arxiv-2409.09010

Kanchan Shivashankar, Nadine Steinmetz

Scholarly communication is a rapid growing field containing a wealth ofknowledge. However, due to its unstructured and document format, it ischallenging to extract useful information from them through conventionaldocument retrieval methods. Scholarly knowledge graphs solve this problem, byrepresenting the documents in a semantic network, providing, hidden insights,summaries and ease of accessibility through queries. Naturally, questionanswering for scholarly graphs expands the accessibility to a wider audience.But some of the knowledge in this domain is still presented as unstructuredtext, thus requiring a hybrid solution for question answering systems. In thispaper, we present a two step solution using open source Large LanguageModel(LLM): Llama3.1 for Scholarly-QALD dataset. Firstly, we extract thecontext pertaining to the question from different structured and unstructureddata sources: DBLP, SemOpenAlex knowledge graphs and Wikipedia text. Secondly,we implement prompt engineering to improve the information retrievalperformance of the LLM. Our approach achieved an F1 score of 40% and alsoobserved some anomalous responses from the LLM, that are discussed in the finalpart of the paper.

学术交流是一个快速发展的领域，蕴含着丰富的知识。然而，由于其非结构化和文档格式，通过传统的文档检索方法从中提取有用信息是一项挑战。学术知识图谱解决了这一问题，它以语义网络的形式表示文档，提供隐藏的见解、摘要，并通过查询方便地获取。当然，学术知识图谱的问题解答可以让更多的人获得知识，但这一领域的部分知识仍以非结构化文本的形式呈现，因此需要问题解答系统的混合解决方案。在本文中，我们提出了一种使用开源大型语言模型（LLM）的两步解决方案：Llama3.1 用于 Scholarly-QALD 数据集。首先，我们从不同的结构化和非结构化数据源中提取与问题相关的上下文：DBLP、SemOpenAlex 知识图谱和维基百科文本。其次，我们实施了提示工程，以提高 LLM 的信息检索性能。我们的方法获得了 40% 的 F1 分数，同时还发现了 LLM 的一些异常响应，本文最后一部分将对此进行讨论。

{"title":"Contri(e)ve: Context + Retrieve for Scholarly Question Answering","authors":"Kanchan Shivashankar, Nadine Steinmetz","doi":"arxiv-2409.09010","DOIUrl":"https://doi.org/arxiv-2409.09010","url":null,"abstract":"Scholarly communication is a rapid growing field containing a wealth of\u0000knowledge. However, due to its unstructured and document format, it is\u0000challenging to extract useful information from them through conventional\u0000document retrieval methods. Scholarly knowledge graphs solve this problem, by\u0000representing the documents in a semantic network, providing, hidden insights,\u0000summaries and ease of accessibility through queries. Naturally, question\u0000answering for scholarly graphs expands the accessibility to a wider audience.\u0000But some of the knowledge in this domain is still presented as unstructured\u0000text, thus requiring a hybrid solution for question answering systems. In this\u0000paper, we present a two step solution using open source Large Language\u0000Model(LLM): Llama3.1 for Scholarly-QALD dataset. Firstly, we extract the\u0000context pertaining to the question from different structured and unstructured\u0000data sources: DBLP, SemOpenAlex knowledge graphs and Wikipedia text. Secondly,\u0000we implement prompt engineering to improve the information retrieval\u0000performance of the LLM. Our approach achieved an F1 score of 40% and also\u0000observed some anomalous responses from the LLM, that are discussed in the final\u0000part of the paper.","PeriodicalId":501281,"journal":{"name":"arXiv - CS - Information Retrieval","volume":"47 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142269091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

ATFLRec: A Multimodal Recommender System with Audio-Text Fusion and Low-Rank Adaptation via Instruction-Tuned Large Language Model ATFLRec：通过指令调谐大语言模型实现音频-文本融合和低级别自适应的多模态推荐系统

arXiv - CS - Information Retrieval

Pub Date : 2024-09-13 DOI: arxiv-2409.08543

Zezheng Qin

Recommender Systems (RS) play a pivotal role in boosting user satisfaction byproviding personalized product suggestions in domains such as e-commerce andentertainment. This study examines the integration of multimodal data text andaudio into large language models (LLMs) with the aim of enhancingrecommendation performance. Traditional text and audio recommenders encounterlimitations such as the cold-start problem, and recent advancements in LLMs,while promising, are computationally expensive. To address these issues,Low-Rank Adaptation (LoRA) is introduced, which enhances efficiency withoutcompromising performance. The ATFLRec framework is proposed to integrate audioand text modalities into a multimodal recommendation system, utilizing variousLoRA configurations and modality fusion techniques. Results indicate thatATFLRec outperforms baseline models, including traditional and graph neuralnetwork-based approaches, achieving higher AUC scores. Furthermore, separatefine-tuning of audio and text data with distinct LoRA modules yields optimalperformance, with different pooling methods and Mel filter bank numberssignificantly impacting performance. This research offers valuable insightsinto optimizing multimodal recommender systems and advancing the integration ofdiverse data modalities in LLMs.

推荐系统（RS）通过在电子商务和娱乐等领域提供个性化产品建议，在提高用户满意度方面发挥着举足轻重的作用。本研究探讨了如何将多模态数据文本和音频整合到大型语言模型（LLM）中，以提高推荐性能。传统的文本和音频推荐器存在冷启动问题等局限性，而最近在 LLMs 方面取得的进展虽然前景广阔，但计算成本高昂。为了解决这些问题，我们引入了低库自适应（Low-Rank Adaptation，LoRA）技术，它既能提高效率，又不会降低性能。我们提出了 ATFLRec 框架，利用各种 LoRA 配置和模态融合技术将音频和文本模态整合到多模态推荐系统中。结果表明，ATFLRec 优于基线模型，包括传统方法和基于图神经网络的方法，获得了更高的 AUC 分数。此外，使用不同的 LoRA 模块对音频和文本数据进行单独微调可获得最佳性能，不同的池化方法和梅尔滤波器组数量对性能有显著影响。这项研究为优化多模态推荐系统和推进 LLM 中多种数据模态的整合提供了宝贵的见解。

{"title":"ATFLRec: A Multimodal Recommender System with Audio-Text Fusion and Low-Rank Adaptation via Instruction-Tuned Large Language Model","authors":"Zezheng Qin","doi":"arxiv-2409.08543","DOIUrl":"https://doi.org/arxiv-2409.08543","url":null,"abstract":"Recommender Systems (RS) play a pivotal role in boosting user satisfaction by\u0000providing personalized product suggestions in domains such as e-commerce and\u0000entertainment. This study examines the integration of multimodal data text and\u0000audio into large language models (LLMs) with the aim of enhancing\u0000recommendation performance. Traditional text and audio recommenders encounter\u0000limitations such as the cold-start problem, and recent advancements in LLMs,\u0000while promising, are computationally expensive. To address these issues,\u0000Low-Rank Adaptation (LoRA) is introduced, which enhances efficiency without\u0000compromising performance. The ATFLRec framework is proposed to integrate audio\u0000and text modalities into a multimodal recommendation system, utilizing various\u0000LoRA configurations and modality fusion techniques. Results indicate that\u0000ATFLRec outperforms baseline models, including traditional and graph neural\u0000network-based approaches, achieving higher AUC scores. Furthermore, separate\u0000fine-tuning of audio and text data with distinct LoRA modules yields optimal\u0000performance, with different pooling methods and Mel filter bank numbers\u0000significantly impacting performance. This research offers valuable insights\u0000into optimizing multimodal recommender systems and advancing the integration of\u0000diverse data modalities in LLMs.","PeriodicalId":501281,"journal":{"name":"arXiv - CS - Information Retrieval","volume":"43 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142255112","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0