自动语音识别：促进人工智能创新 [从系统角度看电路］

IEEE Solid-State Circuits Magazine Pub Date : 2024-11-13 DOI:10.1109/MSSC.2024.3473747

Farhana Sheikh

{"title":"自动语音识别：促进人工智能创新 [从系统角度看电路］","authors":"Farhana Sheikh","doi":"10.1109/MSSC.2024.3473747","DOIUrl":null,"url":null,"abstract":"Roughly 25 years ago, it was rare for an individual to interact with a voice-activated service on a daily basis. Since then, there has been an exponential rise in devices that use the human voice as input. A study published in \n<italic>Forbes</i>\n magazine \n<xref>[1]</xref>\n estimates that by 2025 the global voice recognition market will reach US\n<inline-formula><tex-math>${\\$}$</tex-math></inline-formula>\n26.8 billion. Today, more than one in four people regularly use voice search, and by the end of 2024, it is estimated that the number of digital voice assistants in the world will reach 8.4 billion \n<xref>[1]</xref>\n, slightly greater than the world’s total population. As the use of voice-activated assistants exponentially rises, it is also expected that commercial transactions made through such devices will increase, reaching US\n<inline-formula><tex-math>${\\$}$</tex-math></inline-formula>\n164 billion by 2025 \n<xref>[1]</xref>\n. Interestingly enough, the technologies that enabled the recent skyrocketing acceptance of voice rather than text as input to a computing machine are closely tied to the recent natural language processing (NLP) phenomena responsible for artificial intelligence (AI) engines such as ChatGPT. In this installment of “Circuits From a Systems Perspective,” we briefly review the history of automatic speech recognition (ASR) and show how intertwined it is with NLP that has led to large language models (LLMs) which have spurred the new age of AI. We review a modern speech recognition system and some of the circuits that could possibly enhance ASR systems that we may see in the future.","PeriodicalId":100636,"journal":{"name":"IEEE Solid-State Circuits Magazine","volume":"16 4","pages":"29-116"},"PeriodicalIF":0.0000,"publicationDate":"2024-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Automated Speech Recognition: Spurring Artificial Intelligence Innovation [Circuits from a Systems Perspective]\",\"authors\":\"Farhana Sheikh\",\"doi\":\"10.1109/MSSC.2024.3473747\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Roughly 25 years ago, it was rare for an individual to interact with a voice-activated service on a daily basis. Since then, there has been an exponential rise in devices that use the human voice as input. A study published in \\n<italic>Forbes</i>\\n magazine \\n<xref>[1]</xref>\\n estimates that by 2025 the global voice recognition market will reach US\\n<inline-formula><tex-math>${\\\\$}$</tex-math></inline-formula>\\n26.8 billion. Today, more than one in four people regularly use voice search, and by the end of 2024, it is estimated that the number of digital voice assistants in the world will reach 8.4 billion \\n<xref>[1]</xref>\\n, slightly greater than the world’s total population. As the use of voice-activated assistants exponentially rises, it is also expected that commercial transactions made through such devices will increase, reaching US\\n<inline-formula><tex-math>${\\\\$}$</tex-math></inline-formula>\\n164 billion by 2025 \\n<xref>[1]</xref>\\n. Interestingly enough, the technologies that enabled the recent skyrocketing acceptance of voice rather than text as input to a computing machine are closely tied to the recent natural language processing (NLP) phenomena responsible for artificial intelligence (AI) engines such as ChatGPT. In this installment of “Circuits From a Systems Perspective,” we briefly review the history of automatic speech recognition (ASR) and show how intertwined it is with NLP that has led to large language models (LLMs) which have spurred the new age of AI. We review a modern speech recognition system and some of the circuits that could possibly enhance ASR systems that we may see in the future.\",\"PeriodicalId\":100636,\"journal\":{\"name\":\"IEEE Solid-State Circuits Magazine\",\"volume\":\"16 4\",\"pages\":\"29-116\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-11-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Solid-State Circuits Magazine\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10752805/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Solid-State Circuits Magazine","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10752805/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

大约 25 年前，人们还很少每天与声控服务进行交互。从那时起，使用人声作为输入的设备呈指数级增长。福布斯》杂志[1] 发表的一项研究估计，到 2025 年，全球语音识别市场规模将达到 268 亿美元。如今，每四个人中就有一个以上经常使用语音搜索，预计到 2024 年底，全球数字语音助手的数量将达到 84 亿[1]，略高于全球总人口。随着声控助手的使用呈指数级增长，预计通过此类设备进行的商业交易也将增加，到 2025 年将达到 1,640 亿美元[1]。有趣的是，最近人们对语音而非文字作为计算机输入的接受程度急剧上升，这与最近出现的自然语言处理（NLP）现象密切相关，而这些现象正是 ChatGPT 等人工智能（AI）引擎的功劳。在本期的 "系统视角下的电路 "中，我们将简要回顾自动语音识别（ASR）的历史，并说明它与 NLP 是如何交织在一起的，而 NLP 又导致了大型语言模型（LLM）的出现，从而推动了人工智能的新时代。我们回顾了现代语音识别系统和一些电路，这些电路有可能增强我们未来可能看到的 ASR 系统。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Automated Speech Recognition: Spurring Artificial Intelligence Innovation [Circuits from a Systems Perspective]

Roughly 25 years ago, it was rare for an individual to interact with a voice-activated service on a daily basis. Since then, there has been an exponential rise in devices that use the human voice as input. A study published in Forbes magazine [1] estimates that by 2025 the global voice recognition market will reach US

${\$}$

26.8 billion. Today, more than one in four people regularly use voice search, and by the end of 2024, it is estimated that the number of digital voice assistants in the world will reach 8.4 billion [1] , slightly greater than the world’s total population. As the use of voice-activated assistants exponentially rises, it is also expected that commercial transactions made through such devices will increase, reaching US

${\$}$

164 billion by 2025 [1] . Interestingly enough, the technologies that enabled the recent skyrocketing acceptance of voice rather than text as input to a computing machine are closely tied to the recent natural language processing (NLP) phenomena responsible for artificial intelligence (AI) engines such as ChatGPT. In this installment of “Circuits From a Systems Perspective,” we briefly review the history of automatic speech recognition (ASR) and show how intertwined it is with NLP that has led to large language models (LLMs) which have spurred the new age of AI. We review a modern speech recognition system and some of the circuits that could possibly enhance ASR systems that we may see in the future.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊