arXiv - CS - Computation and Language最新文献

英文中文

A Controlled Study on Long Context Extension and Generalization in LLMs 关于语言学习者长语境扩展和泛化的对照研究

arXiv - CS - Computation and Language

Pub Date : 2024-09-18 DOI: arxiv-2409.12181

Yi Lu, Jing Nathan Yan, Songlin Yang, Justin T. Chiu, Siyu Ren, Fei Yuan, Wenting Zhao, Zhiyong Wu, Alexander M. Rush

Broad textual understanding and in-context learning require language modelsthat utilize full document contexts. Due to the implementation challengesassociated with directly training long-context models, many methods have beenproposed for extending models to handle long contexts. However, owing todifferences in data and model classes, it has been challenging to compare theseapproaches, leading to uncertainty as to how to evaluate long-contextperformance and whether it differs from standard evaluation. We implement acontrolled protocol for extension methods with a standardized evaluation,utilizing consistent base models and extension data. Our study yields severalinsights into long-context behavior. First, we reaffirm the critical role ofperplexity as a general-purpose performance indicator even in longer-contexttasks. Second, we find that current approximate attention methodssystematically underperform across long-context tasks. Finally, we confirm thatexact fine-tuning based methods are generally effective within the range oftheir extension, whereas extrapolation remains challenging. All codebases,models, and checkpoints will be made available open-source, promotingtransparency and facilitating further research in this critical area of AIdevelopment.

广泛的文本理解和上下文学习需要利用完整文档上下文的语言模型。由于直接训练长语境模型在实现上存在挑战，人们提出了许多方法来扩展模型以处理长语境。然而，由于数据和模型类别的差异，对这些方法进行比较具有挑战性，从而导致了如何评估长语境性能以及它是否有别于标准评估的不确定性。我们利用一致的基础模型和扩展数据，为具有标准化评估的扩展方法实施了受控协议。我们的研究对长情境行为提出了几点见解。首先，我们重申了复杂度作为通用性能指标的关键作用，即使在长情境任务中也是如此。其次，我们发现当前的近似注意方法在长情境任务中表现不佳。最后，我们证实了基于近似微调的方法在其扩展范围内通常是有效的，而外推法仍然具有挑战性。所有代码库、模型和检查点都将开源，以提高透明度，促进人工智能发展这一关键领域的进一步研究。

{"title":"A Controlled Study on Long Context Extension and Generalization in LLMs","authors":"Yi Lu, Jing Nathan Yan, Songlin Yang, Justin T. Chiu, Siyu Ren, Fei Yuan, Wenting Zhao, Zhiyong Wu, Alexander M. Rush","doi":"arxiv-2409.12181","DOIUrl":"https://doi.org/arxiv-2409.12181","url":null,"abstract":"Broad textual understanding and in-context learning require language models\u0000that utilize full document contexts. Due to the implementation challenges\u0000associated with directly training long-context models, many methods have been\u0000proposed for extending models to handle long contexts. However, owing to\u0000differences in data and model classes, it has been challenging to compare these\u0000approaches, leading to uncertainty as to how to evaluate long-context\u0000performance and whether it differs from standard evaluation. We implement a\u0000controlled protocol for extension methods with a standardized evaluation,\u0000utilizing consistent base models and extension data. Our study yields several\u0000insights into long-context behavior. First, we reaffirm the critical role of\u0000perplexity as a general-purpose performance indicator even in longer-context\u0000tasks. Second, we find that current approximate attention methods\u0000systematically underperform across long-context tasks. Finally, we confirm that\u0000exact fine-tuning based methods are generally effective within the range of\u0000their extension, whereas extrapolation remains challenging. All codebases,\u0000models, and checkpoints will be made available open-source, promoting\u0000transparency and facilitating further research in this critical area of AI\u0000development.","PeriodicalId":501030,"journal":{"name":"arXiv - CS - Computation and Language","volume":"8 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142262475","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

You Only Read Once (YORO): Learning to Internalize Database Knowledge for Text-to-SQL 只读一次 (YORO)：学习内化数据库知识，实现文本到 SQL 的转换

arXiv - CS - Computation and Language

Pub Date : 2024-09-18 DOI: arxiv-2409.12172

Hideo Kobayashi, Wuwei Lan, Peng Shi, Shuaichen Chang, Jiang Guo, Henghui Zhu, Zhiguo Wang, Patrick Ng

While significant progress has been made on the text-to-SQL task, recentsolutions repeatedly encode the same database schema for every question,resulting in unnecessary high inference cost and often overlooking crucialdatabase knowledge. To address these issues, we propose You Only Read Once(YORO), a novel paradigm that directly internalizes database knowledge into theparametric knowledge of a text-to-SQL model during training and eliminates theneed for schema encoding during inference. YORO significantly reduces the inputtoken length by 66%-98%. Despite its shorter inputs, our empirical resultsdemonstrate YORO's competitive performances with traditional systems on threebenchmarks as well as its significant outperformance on large databases.Furthermore, YORO excels in handling questions with challenging valueretrievals such as abbreviation.

虽然文本到 SQL 任务已经取得了重大进展，但最近的解决方案对每个问题都重复编码相同的数据库模式，导致不必要的高推理成本，而且经常忽略关键的数据库知识。为了解决这些问题，我们提出了 "只读一次"（YORO）这一新颖的范式，它能在训练过程中将数据库知识直接内化到文本到 SQL 模型的参数知识中，而无需在推理过程中进行模式编码。YORO 将输入令牌的长度大幅减少了 66%-98%。此外，YORO 在处理缩写等具有挑战性的数值检索问题时表现出色。

引用次数: 0

Using Large Language Models to Generate Clinical Trial Tables and Figures 使用大型语言模型生成临床试验表格和图表

arXiv - CS - Computation and Language

Pub Date : 2024-09-18 DOI: arxiv-2409.12046

Yumeng Yang, Peter Krusche, Kristyn Pantoja, Cheng Shi, Ethan Ludmir, Kirk Roberts, Gen Zhu

Tables, figures, and listings (TFLs) are essential tools for summarizingclinical trial data. Creation of TFLs for reporting activities is often atime-consuming task encountered routinely during the execution of clinicaltrials. This study explored the use of large language models (LLMs) to automatethe generation of TFLs through prompt engineering and few-shot transferlearning. Using public clinical trial data in ADaM format, our resultsdemonstrated that LLMs can efficiently generate TFLs with prompt instructions,showcasing their potential in this domain. Furthermore, we developed aconservational agent named Clinical Trial TFL Generation Agent: An app thatmatches user queries to predefined prompts that produce customized programs togenerate specific predefined TFLs.

表、图和列表（TFL）是总结临床试验数据的重要工具。为报告活动创建 TFL 通常是临床试验执行过程中经常遇到的耗时任务。本研究探索了如何使用大型语言模型（LLM），通过提示工程和少量迁移学习自动生成 TFL。通过使用 ADaM 格式的公开临床试验数据，我们的研究结果表明，LLM 可以通过提示指令高效生成 TFL，从而展示了其在该领域的潜力。此外，我们还开发了一个名为 "临床试验 TFL 生成代理 "的服务代理：该应用可将用户查询与预定义提示相匹配，从而生成定制程序，生成特定的预定义 TFL。

引用次数: 0

From Lists to Emojis: How Format Bias Affects Model Alignment 从列表到表情符号：格式偏差如何影响模型对齐

arXiv - CS - Computation and Language

Pub Date : 2024-09-18 DOI: arxiv-2409.11704

Xuanchang Zhang, Wei Xiong, Lichang Chen, Tianyi Zhou, Heng Huang, Tong Zhang

In this paper, we study format biases in reinforcement learning from humanfeedback (RLHF). We observe that many widely-used preference models, includinghuman evaluators, GPT-4, and top-ranking models on the RewardBench benchmark,exhibit strong biases towards specific format patterns, such as lists, links,bold text, and emojis. Furthermore, large language models (LLMs) can exploitthese biases to achieve higher rankings on popular benchmarks like AlpacaEvaland LMSYS Chatbot Arena. One notable example of this is verbosity bias, wherecurrent preference models favor longer responses that appear morecomprehensive, even when their quality is equal to or lower than shorter,competing responses. However, format biases beyond verbosity remain largelyunderexplored in the literature. In this work, we extend the study of biases inpreference learning beyond the commonly recognized length bias, offering acomprehensive analysis of a wider range of format biases. Additionally, we showthat with a small amount of biased data (less than 1%), we can injectsignificant bias into the reward model. Moreover, these format biases can alsobe easily exploited by downstream alignment algorithms, such as best-of-nsampling and online iterative DPO, as it is usually easier to manipulate theformat than to improve the quality of responses. Our findings emphasize theneed to disentangle format and content both for designing alignment algorithmsand evaluating models.

本文研究了人类反馈强化学习（RLHF）中的格式偏差。我们观察到，许多广泛使用的偏好模型，包括人类评估者、GPT-4 和 RewardBench 基准上排名靠前的模型，都表现出对特定格式模式的强烈偏好，如列表、链接、粗体文字和表情符号。此外，大型语言模型（LLM）可以利用这些偏差在 AlpacaEval 和 LMSYS Chatbot Arena 等流行基准测试中获得更高的排名。其中一个明显的例子就是 "冗长度偏差"（verbosity bias），在这种情况下，当前的偏好模型会倾向于看起来更全面的较长的回复，即使其质量等同于或低于较短的竞争性回复。然而，文献中对文字量之外的格式偏差基本上还没有进行深入探讨。在这项工作中，我们将偏好学习中的偏差研究扩展到了公认的长度偏差之外，对更广泛的格式偏差进行了全面分析。此外，我们还证明，只需少量有偏差的数据（小于 1%），我们就能为奖励模型注入显著的偏差。此外，这些格式偏差也很容易被下游配准算法（如最佳采样和在线迭代 DPO）利用，因为通常操纵格式比提高响应质量更容易。我们的研究结果强调，在设计配准算法和评估模型时都需要将格式和内容分开。

{"title":"From Lists to Emojis: How Format Bias Affects Model Alignment","authors":"Xuanchang Zhang, Wei Xiong, Lichang Chen, Tianyi Zhou, Heng Huang, Tong Zhang","doi":"arxiv-2409.11704","DOIUrl":"https://doi.org/arxiv-2409.11704","url":null,"abstract":"In this paper, we study format biases in reinforcement learning from human\u0000feedback (RLHF). We observe that many widely-used preference models, including\u0000human evaluators, GPT-4, and top-ranking models on the RewardBench benchmark,\u0000exhibit strong biases towards specific format patterns, such as lists, links,\u0000bold text, and emojis. Furthermore, large language models (LLMs) can exploit\u0000these biases to achieve higher rankings on popular benchmarks like AlpacaEval\u0000and LMSYS Chatbot Arena. One notable example of this is verbosity bias, where\u0000current preference models favor longer responses that appear more\u0000comprehensive, even when their quality is equal to or lower than shorter,\u0000competing responses. However, format biases beyond verbosity remain largely\u0000underexplored in the literature. In this work, we extend the study of biases in\u0000preference learning beyond the commonly recognized length bias, offering a\u0000comprehensive analysis of a wider range of format biases. Additionally, we show\u0000that with a small amount of biased data (less than 1%), we can inject\u0000significant bias into the reward model. Moreover, these format biases can also\u0000be easily exploited by downstream alignment algorithms, such as best-of-n\u0000sampling and online iterative DPO, as it is usually easier to manipulate the\u0000format than to improve the quality of responses. Our findings emphasize the\u0000need to disentangle format and content both for designing alignment algorithms\u0000and evaluating models.","PeriodicalId":501030,"journal":{"name":"arXiv - CS - Computation and Language","volume":"24 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142262434","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Enhancing Complex Formula Recognition with Hierarchical Detail-Focused Network 利用聚焦细节的分层网络增强复杂公式识别能力

arXiv - CS - Computation and Language

Pub Date : 2024-09-18 DOI: arxiv-2409.11677

Jiale Wang, Junhui Yu, Huanyong Liu, Chenanran Kong

Hierarchical and complex Mathematical Expression Recognition (MER) ischallenging due to multiple possible interpretations of a formula, complicatingboth parsing and evaluation. In this paper, we introduce the HierarchicalDetail-Focused Recognition dataset (HDR), the first dataset specificallydesigned to address these issues. It consists of a large-scale training set,HDR-100M, offering an unprecedented scale and diversity with one hundredmillion training instances. And the test set, HDR-Test, includes multipleinterpretations of complex hierarchical formulas for comprehensive modelperformance evaluation. Additionally, the parsing of complex formulas oftensuffers from errors in fine-grained details. To address this, we propose theHierarchical Detail-Focused Recognition Network (HDNet), an innovativeframework that incorporates a hierarchical sub-formula module, focusing on theprecise handling of formula details, thereby significantly enhancing MERperformance. Experimental results demonstrate that HDNet outperforms existingMER models across various datasets.

分层复杂数学表达式识别（MER）是一项挑战，因为一个公式可能有多种解释，这使得解析和评估都变得复杂。在本文中，我们介绍了分层细节识别数据集（HDR），这是第一个专门为解决这些问题而设计的数据集。它由大规模训练集 HDR-100M 和测试集 HDR-TM 组成。测试集 HDR-Test 包括对复杂分层公式的多种解释，用于全面评估模型性能。此外，复杂公式的解析经常会出现细节错误。为了解决这个问题，我们提出了分层细节识别网络（HDNet），这是一个创新的框架，其中包含一个分层子公式模块，重点是精确处理公式细节，从而显著提高 MER 性能。实验结果表明，在各种数据集上，HDNet 的性能均优于现有的 MER 模型。

引用次数: 0

Linguini: A benchmark for language-agnostic linguistic reasoning 语言学推理基准与语言无关的语言推理基准

arXiv - CS - Computation and Language

Pub Date : 2024-09-18 DOI: arxiv-2409.12126

Eduardo Sánchez, Belen Alastruey, Christophe Ropers, Pontus Stenetorp, Mikel Artetxe, Marta R. Costa-jussà

We propose a new benchmark to measure a language model's linguistic reasoningskills without relying on pre-existing language-specific knowledge. The testcovers 894 questions grouped in 160 problems across 75 (mostly) extremelylow-resource languages, extracted from the International Linguistic Olympiadcorpus. To attain high accuracy on this benchmark, models don't need previousknowledge of the tested language, as all the information needed to solve thelinguistic puzzle is presented in the context. We find that, while all analyzedmodels rank below 25% accuracy, there is a significant gap between open andclosed models, with the best-performing proprietary model at 24.05% and thebest-performing open model at 8.84%.

我们提出了一种新的基准来衡量语言模型的语言推理能力，而无需依赖已有的特定语言知识。该测试涵盖了从国际语言学奥林匹克语料库中提取的 75 种（大部分）资源极其匮乏的语言的 160 个问题中的 894 个问题。要想在这一基准测试中获得高准确率，模型不需要事先了解被测语言，因为解决语言难题所需的所有信息都会在上下文中呈现。我们发现，虽然所有分析模型的准确率都低于 25%，但开放模型和封闭模型之间存在明显差距，表现最好的专有模型准确率为 24.05%，表现最好的开放模型准确率为 8.84%。

引用次数: 0

DocMamba: Efficient Document Pre-training with State Space Model DocMamba：利用状态空间模型进行高效文档预培训

arXiv - CS - Computation and Language

Pub Date : 2024-09-18 DOI: arxiv-2409.11887

Pengfei Hu, Zhenrong Zhang, Jiefeng Ma, Shuhang Liu, Jun Du, Jianshu Zhang

In recent years, visually-rich document understanding has attractedincreasing attention. Transformer-based pre-trained models have become themainstream approach, yielding significant performance gains in this field.However, the self-attention mechanism's quadratic computational complexityhinders their efficiency and ability to process long documents. In this paper,we present DocMamba, a novel framework based on the state space model. It isdesigned to reduce computational complexity to linear while preserving globalmodeling capabilities. To further enhance its effectiveness in documentprocessing, we introduce the Segment-First Bidirectional Scan (SFBS) to capturecontiguous semantic information. Experimental results demonstrate that DocMambaachieves new state-of-the-art results on downstream datasets such as FUNSD,CORD, and SORIE, while significantly improving speed and reducing memory usage.Notably, experiments on the HRDoc confirm DocMamba's potential for lengthextrapolation. The code will be available online.

近年来，视觉丰富的文档理解吸引了越来越多的关注。然而，自注意机制的二次计算复杂性阻碍了其处理长文档的效率和能力。在本文中，我们介绍了基于状态空间模型的新型框架 DocMamba。它旨在将计算复杂度降至线性，同时保留全局建模能力。为了进一步提高其在文档处理中的有效性，我们引入了分段优先双向扫描（SFBS）来捕捉连续的语义信息。实验结果表明，DocMamba在FUNSD、CORD和SORIE等下游数据集上取得了新的一流成果，同时显著提高了速度并减少了内存使用。代码将在网上公布。

引用次数: 0

Development and bilingual evaluation of Japanese medical large language model within reasonably low computational resources 在合理的低计算资源条件下开发日本医学大语言模型并进行双语评估

arXiv - CS - Computation and Language

Pub Date : 2024-09-18 DOI: arxiv-2409.11783

Issey Sukeda

The recent success of large language models (LLMs) and the scaling law hasled to a widespread adoption of larger models. Particularly in the healthcareindustry, there is an increasing demand for locally operated LLMs due tosecurity concerns. However, the majority of high quality open-source LLMs havea size of 70B parameters, imposing significant financial burdens on users forGPU preparation and operation. To overcome these issues, we present a medicaladaptation based on the recent 7B models, which enables the operation in lowcomputational resources. We compare the performance on medicalquestion-answering benchmarks in two languages (Japanese and English),demonstrating that its scores reach parity with or surpass those of currentlyexisting medical LLMs that are ten times larger. We find that fine-tuning anEnglish-centric base model on Japanese medical dataset improves the score inboth language, supporting the effect of cross-lingual knowledge transfer. Wehope that this study will alleviate financial challenges, serving as a steppingstone for clinical institutions to practically utilize LLMs locally. Ourevaluation code is available athttps://huggingface.co/stardust-coder/jmedllm-7b-v1.

最近，大型语言模型（LLMs）的成功和扩展规律促使大型模型被广泛采用。特别是在医疗保健行业，出于安全考虑，对本地运行的 LLM 的需求越来越大。然而，大多数高质量的开源 LLM 都有 70B 的参数，这给用户的 GPU 准备和运行带来了巨大的经济负担。为了克服这些问题，我们提出了一种基于最新 7B 模型的医疗适应方法，它可以在低计算资源下运行。我们比较了两种语言（日语和英语）医学问题解答基准的性能，结果表明它的得分与目前已有的医学 LLMs 相当，甚至超过了它们的十倍。我们发现，在日语医学数据集上对以英语为中心的基础模型进行微调，可以提高两种语言的得分，这支持了跨语言知识转移的效果。我们希望这项研究能缓解财政困难，为临床机构在本地实际利用 LLMs 铺平道路。评价代码见https://huggingface.co/stardust-coder/jmedllm-7b-v1。

{"title":"Development and bilingual evaluation of Japanese medical large language model within reasonably low computational resources","authors":"Issey Sukeda","doi":"arxiv-2409.11783","DOIUrl":"https://doi.org/arxiv-2409.11783","url":null,"abstract":"The recent success of large language models (LLMs) and the scaling law has\u0000led to a widespread adoption of larger models. Particularly in the healthcare\u0000industry, there is an increasing demand for locally operated LLMs due to\u0000security concerns. However, the majority of high quality open-source LLMs have\u0000a size of 70B parameters, imposing significant financial burdens on users for\u0000GPU preparation and operation. To overcome these issues, we present a medical\u0000adaptation based on the recent 7B models, which enables the operation in low\u0000computational resources. We compare the performance on medical\u0000question-answering benchmarks in two languages (Japanese and English),\u0000demonstrating that its scores reach parity with or surpass those of currently\u0000existing medical LLMs that are ten times larger. We find that fine-tuning an\u0000English-centric base model on Japanese medical dataset improves the score in\u0000both language, supporting the effect of cross-lingual knowledge transfer. We\u0000hope that this study will alleviate financial challenges, serving as a stepping\u0000stone for clinical institutions to practically utilize LLMs locally. Our\u0000evaluation code is available at\u0000https://huggingface.co/stardust-coder/jmedllm-7b-v1.","PeriodicalId":501030,"journal":{"name":"arXiv - CS - Computation and Language","volume":"11 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142262430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

"A Woman is More Culturally Knowledgeable than A Man?": The Effect of Personas on Cultural Norm Interpretation in LLMs "女人比男人更懂文化？人格对法律硕士文化规范解释的影响

arXiv - CS - Computation and Language

Pub Date : 2024-09-18 DOI: arxiv-2409.11636

Mahammed Kamruzzaman, Hieu Nguyen, Nazmul Hassan, Gene Louis Kim

As the deployment of large language models (LLMs) expands, there is anincreasing demand for personalized LLMs. One method to personalize and guidethe outputs of these models is by assigning a persona -- a role that describesthe expected behavior of the LLM (e.g., a man, a woman, an engineer). Thisstudy investigates whether an LLM's understanding of social norms varies acrossassigned personas. Ideally, the perception of a social norm should remainconsistent regardless of the persona, since acceptability of a social normshould be determined by the region the norm originates from, rather than byindividual characteristics such as gender, body size, or race. A norm isuniversal within its cultural context. In our research, we tested 36 distinctpersonas from 12 sociodemographic categories (e.g., age, gender, beauty) acrossfour different LLMs. We find that LLMs' cultural norm interpretation variesbased on the persona used and the norm interpretation also varies within asociodemographic category (e.g., a fat person and a thin person as in physicalappearance group) where an LLM with the more socially desirable persona (e.g.,a thin person) interprets social norms more accurately than with the lesssocially desirable persona (e.g., a fat person). We also discuss how differenttypes of social biases may contribute to the results that we observe.

随着大型语言模型（LLM）部署的扩大，对个性化 LLM 的需求也越来越多。个性化和指导这些模型输出的方法之一是分配角色--描述 LLM 预期行为的角色（如男性、女性、工程师）。本研究调查了不同角色的法学硕士对社会规范的理解是否有所不同。在理想的情况下，不管是什么角色，对社会规范的理解都应该是一致的，因为社会规范的可接受性应该是由该规范所产生的地区来决定的，而不是由性别、体型或种族等个人特征来决定的。规范在其文化背景下具有普遍性。在我们的研究中，我们测试了来自四个不同内陆城市的 12 个社会人口统计类别（如年龄、性别、美貌）的 36 个不同角色。我们发现，文化学者对文化规范的解释因所使用的角色而异，而且在不同的社会人口学类别中（如外貌组中的胖子和瘦子），文化学者对社会规范的解释也不尽相同，使用社会期望值较高的角色（如瘦子）的文化学者比使用社会期望值较低的角色（如胖子）的文化学者对社会规范的解释更准确。我们还讨论了不同类型的社会偏见是如何导致我们观察到的结果的。

{"title":"\"A Woman is More Culturally Knowledgeable than A Man?\": The Effect of Personas on Cultural Norm Interpretation in LLMs","authors":"Mahammed Kamruzzaman, Hieu Nguyen, Nazmul Hassan, Gene Louis Kim","doi":"arxiv-2409.11636","DOIUrl":"https://doi.org/arxiv-2409.11636","url":null,"abstract":"As the deployment of large language models (LLMs) expands, there is an\u0000increasing demand for personalized LLMs. One method to personalize and guide\u0000the outputs of these models is by assigning a persona -- a role that describes\u0000the expected behavior of the LLM (e.g., a man, a woman, an engineer). This\u0000study investigates whether an LLM's understanding of social norms varies across\u0000assigned personas. Ideally, the perception of a social norm should remain\u0000consistent regardless of the persona, since acceptability of a social norm\u0000should be determined by the region the norm originates from, rather than by\u0000individual characteristics such as gender, body size, or race. A norm is\u0000universal within its cultural context. In our research, we tested 36 distinct\u0000personas from 12 sociodemographic categories (e.g., age, gender, beauty) across\u0000four different LLMs. We find that LLMs' cultural norm interpretation varies\u0000based on the persona used and the norm interpretation also varies within a\u0000sociodemographic category (e.g., a fat person and a thin person as in physical\u0000appearance group) where an LLM with the more socially desirable persona (e.g.,\u0000a thin person) interprets social norms more accurately than with the less\u0000socially desirable persona (e.g., a fat person). We also discuss how different\u0000types of social biases may contribute to the results that we observe.","PeriodicalId":501030,"journal":{"name":"arXiv - CS - Computation and Language","volume":"50 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142262438","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

BERT-VBD: Vietnamese Multi-Document Summarization Framework BERT-VBD：越南语多文档摘要框架

arXiv - CS - Computation and Language

Pub Date : 2024-09-18 DOI: arxiv-2409.12134

Tuan-Cuong Vuong, Trang Mai Xuan, Thien Van Luong

In tackling the challenge of Multi-Document Summarization (MDS), numerousmethods have been proposed, spanning both extractive and abstractivesummarization techniques. However, each approach has its own limitations,making it less effective to rely solely on either one. An emerging andpromising strategy involves a synergistic fusion of extractive and abstractivesummarization methods. Despite the plethora of studies in this domain, researchon the combined methodology remains scarce, particularly in the context ofVietnamese language processing. This paper presents a novel Vietnamese MDSframework leveraging a two-component pipeline architecture that integratesextractive and abstractive techniques. The first component employs anextractive approach to identify key sentences within each document. This isachieved by a modification of the pre-trained BERT network, which derivessemantically meaningful phrase embeddings using siamese and triplet networkstructures. The second component utilizes the VBD-LLaMA2-7B-50b model forabstractive summarization, ultimately generating the final summary document.Our proposed framework demonstrates a positive performance, attaining ROUGE-2scores of 39.6% on the VN-MDS dataset and outperforming the state-of-the-artbaselines.

在应对多文档摘要（MDS）这一挑战的过程中，人们提出了许多方法，其中既有提取摘要技术，也有抽象摘要技术。然而，每种方法都有其自身的局限性，因此仅依靠其中一种方法的效果并不理想。一种新兴的、有前途的策略涉及提取和抽象摘要方法的协同融合。尽管在这一领域有大量的研究，但关于融合方法的研究仍然很少，尤其是在越南语语言处理方面。本文介绍了一种新颖的越南语 MDS 框架，该框架采用双组件流水线架构，整合了提取和抽象技术。第一部分采用提取方法来识别每个文档中的关键句。这是通过修改预先训练的 BERT 网络来实现的，该网络使用连体和三连体网络结构推导出有意义的短语嵌入。第二部分利用 VBD-LaMA2-7B-50b 模型进行抽象总结，最终生成最终的总结文档。我们提出的框架表现出了积极的性能，在 VN-MDS 数据集上的 ROUGE-2 分数达到了 39.6%，超过了现有的基准线。

{"title":"BERT-VBD: Vietnamese Multi-Document Summarization Framework","authors":"Tuan-Cuong Vuong, Trang Mai Xuan, Thien Van Luong","doi":"arxiv-2409.12134","DOIUrl":"https://doi.org/arxiv-2409.12134","url":null,"abstract":"In tackling the challenge of Multi-Document Summarization (MDS), numerous\u0000methods have been proposed, spanning both extractive and abstractive\u0000summarization techniques. However, each approach has its own limitations,\u0000making it less effective to rely solely on either one. An emerging and\u0000promising strategy involves a synergistic fusion of extractive and abstractive\u0000summarization methods. Despite the plethora of studies in this domain, research\u0000on the combined methodology remains scarce, particularly in the context of\u0000Vietnamese language processing. This paper presents a novel Vietnamese MDS\u0000framework leveraging a two-component pipeline architecture that integrates\u0000extractive and abstractive techniques. The first component employs an\u0000extractive approach to identify key sentences within each document. This is\u0000achieved by a modification of the pre-trained BERT network, which derives\u0000semantically meaningful phrase embeddings using siamese and triplet network\u0000structures. The second component utilizes the VBD-LLaMA2-7B-50b model for\u0000abstractive summarization, ultimately generating the final summary document.\u0000Our proposed framework demonstrates a positive performance, attaining ROUGE-2\u0000scores of 39.6% on the VN-MDS dataset and outperforming the state-of-the-art\u0000baselines.","PeriodicalId":501030,"journal":{"name":"arXiv - CS - Computation and Language","volume":"27 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142262478","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

arXiv - CS - Computation and Language

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀