在为自然语言处理而设计的基础设施内实施语言模型

Bartosz Walkowiak, Tomasz Walkowiak
{"title":"在为自然语言处理而设计的基础设施内实施语言模型","authors":"Bartosz Walkowiak, Tomasz Walkowiak","doi":"10.24425/ijet.2024.149525","DOIUrl":null,"url":null,"abstract":"This paper explores cost-effective alternatives for resource-constrained environments in the context of language models by investigating methods such as quantization and CPUbased model implementations. The study addresses the computational efficiency of language models during inference and the development of infrastructure for text document processing. The paper discusses related technologies, the CLARIN-PL infrastructure architecture, and implementations of small and large language models. The emphasis is on model formats, data precision, and runtime environments (GPU and CPU). It identifies optimal solutions through extensive experimentation. In addition, the paper advocates for a more comprehensive performance evaluation approach. Instead of reporting only average token throughput, it suggests considering the curve’s shape, which can vary from constant to monotonically increasing or decreasing functions. Evaluating token throughput at various curve points, especially for different output token counts, provides a more informative perspective.","PeriodicalId":13922,"journal":{"name":"International Journal of Electronics and Telecommunications","volume":null,"pages":null},"PeriodicalIF":0.5000,"publicationDate":"2024-03-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Implementation of language models within an infrastructure designed for Natural Language Processing\",\"authors\":\"Bartosz Walkowiak, Tomasz Walkowiak\",\"doi\":\"10.24425/ijet.2024.149525\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper explores cost-effective alternatives for resource-constrained environments in the context of language models by investigating methods such as quantization and CPUbased model implementations. The study addresses the computational efficiency of language models during inference and the development of infrastructure for text document processing. The paper discusses related technologies, the CLARIN-PL infrastructure architecture, and implementations of small and large language models. The emphasis is on model formats, data precision, and runtime environments (GPU and CPU). It identifies optimal solutions through extensive experimentation. In addition, the paper advocates for a more comprehensive performance evaluation approach. Instead of reporting only average token throughput, it suggests considering the curve’s shape, which can vary from constant to monotonically increasing or decreasing functions. Evaluating token throughput at various curve points, especially for different output token counts, provides a more informative perspective.\",\"PeriodicalId\":13922,\"journal\":{\"name\":\"International Journal of Electronics and Telecommunications\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.5000,\"publicationDate\":\"2024-03-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Electronics and Telecommunications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.24425/ijet.2024.149525\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"TELECOMMUNICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Electronics and Telecommunications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.24425/ijet.2024.149525","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"TELECOMMUNICATIONS","Score":null,"Total":0}
引用次数: 0

摘要

本文通过研究量化和基于 CPU 的模型实现等方法,探讨了语言模型在资源受限环境下的成本效益替代方案。研究涉及语言模型在推理过程中的计算效率以及文本文档处理基础设施的开发。论文讨论了相关技术、CLARIN-PL 基础架构以及小型和大型语言模型的实现。重点是模型格式、数据精度和运行环境(GPU 和 CPU)。论文通过大量实验确定了最佳解决方案。此外,论文还提倡采用更全面的性能评估方法。它建议考虑曲线的形状,而不是只报告平均令牌吞吐量,因为曲线的形状可以从恒定函数到单调递增或递减函数。评估不同曲线点的令牌吞吐量,尤其是不同输出令牌数的令牌吞吐量,可以提供更多信息。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Implementation of language models within an infrastructure designed for Natural Language Processing
This paper explores cost-effective alternatives for resource-constrained environments in the context of language models by investigating methods such as quantization and CPUbased model implementations. The study addresses the computational efficiency of language models during inference and the development of infrastructure for text document processing. The paper discusses related technologies, the CLARIN-PL infrastructure architecture, and implementations of small and large language models. The emphasis is on model formats, data precision, and runtime environments (GPU and CPU). It identifies optimal solutions through extensive experimentation. In addition, the paper advocates for a more comprehensive performance evaluation approach. Instead of reporting only average token throughput, it suggests considering the curve’s shape, which can vary from constant to monotonically increasing or decreasing functions. Evaluating token throughput at various curve points, especially for different output token counts, provides a more informative perspective.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
1.50
自引率
14.30%
发文量
0
审稿时长
12 weeks
期刊最新文献
Bandwidth enhancement of circular structure microstrip antenna based on inverted C-shaped ground configuration Experimental validation of asymmetric PZT optimal shape in the active vibration reduction of triangular plates Mobile (wireless) telecommunication sector: an Indian perspective and PESTLE analysis Enhanced optimization model decision efficient multi product retail Subjective tests of speaker recognition for selected voice disguise techniques
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1