首页 > 最新文献

Big Data最新文献

英文 中文
Cross-Lingual Speech-to-Text Systems with Low-Latency Neural Networks for Real-Time Applications. 实时应用的低延迟神经网络跨语言语音转文本系统。
IF 2.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2026-03-19 DOI: 10.1177/2167647X261430664
Jawad Khan, Muhammad Hameed Siddiqi, Tariq Rahim, Shah Khalid
{"title":"Cross-Lingual Speech-to-Text Systems with Low-Latency Neural Networks for Real-Time Applications.","authors":"Jawad Khan, Muhammad Hameed Siddiqi, Tariq Rahim, Shah Khalid","doi":"10.1177/2167647X261430664","DOIUrl":"https://doi.org/10.1177/2167647X261430664","url":null,"abstract":"","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":"2167647X261430664"},"PeriodicalIF":2.6,"publicationDate":"2026-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147482355","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MuTemAPR: Enhance Multilocation Patches with Template-Based Neural Program Repair. MuTemAPR:增强多位置补丁与基于模板的神经程序修复。
IF 2.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2026-03-18 DOI: 10.1177/2167647X261428016
Tao Zhang, Yu Zhu

Automated program repair (APR) has been studied extensively in recent years. Existing approaches mainly generate single-position patches that fail to address multilocation faults effectively. While existing multistep repair approaches can iteratively generate patches for each fault position sequentially, their data augmentation methodologies lack rationality and deviate from real-world scenarios. Furthermore, they overlook the interdependencies between faulty statements, leading to patches learned from erroneous contextual patterns. In this article, we propose MuTemAPR, an APR approach that iteratively generates multilocation patches. MuTemAPR incorporates templates with neural machine translation. Specifically, our method introduces three key innovations. First, we design a template-based data augmentation framework that transforms single-line faulty code into multilocation faulty code through 35 mutation templates. It simulates a real-world environment by establishing variable-type mapping tables for more accurate repair augmentation. Second, we propose a reinforced faulty context training method that employs progressive annotation to incrementally learn repair processes from top to bottom in multifault code. Third, we implement a semantic constraint mechanism during training that enforces syntactic and semantic rules through differential analysis between templates, input code, and generated patches. We evaluate MuTemAPR on the widely used Defects4j benchmark. Experimental results demonstrate that our approach can effectively repair multilocation faults, successfully fixing five additional bugs compared with state-of-the-art methods on Defects4j v1.2 and v2.0.

自动程序修复(APR)近年来得到了广泛的研究。现有的方法主要是生成单位置的补丁,不能有效地解决多位置的故障。虽然现有的多步修复方法可以依次迭代地为每个故障位置生成补丁,但其数据增强方法缺乏合理性,并且偏离了现实场景。此外,它们忽略了错误语句之间的相互依赖性,导致从错误的上下文模式中学习补丁。在本文中,我们提出了MuTemAPR,这是一种迭代生成多位置补丁的APR方法。MuTemAPR将模板与神经机器翻译结合在一起。具体来说,我们的方法引入了三个关键创新。首先,我们设计了一个基于模板的数据增强框架,通过35个突变模板将单行错误代码转换为多位置错误代码。它通过建立变量类型映射表来模拟真实世界的环境,从而实现更精确的修复增强。其次,我们提出了一种强化错误上下文训练方法,该方法采用渐进式标注,从上到下增量学习多故障代码中的修复过程。第三,我们在训练过程中实现语义约束机制,通过模板、输入代码和生成补丁之间的差异分析来强制执行语法和语义规则。我们在广泛使用的缺陷4j基准上评估MuTemAPR。实验结果表明,我们的方法可以有效地修复多位置故障,与缺陷4j v1.2和v2.0上最先进的方法相比,成功地修复了五个额外的错误。
{"title":"MuTemAPR: Enhance Multilocation Patches with Template-Based Neural Program Repair.","authors":"Tao Zhang, Yu Zhu","doi":"10.1177/2167647X261428016","DOIUrl":"https://doi.org/10.1177/2167647X261428016","url":null,"abstract":"<p><p>Automated program repair (APR) has been studied extensively in recent years. Existing approaches mainly generate single-position patches that fail to address multilocation faults effectively. While existing multistep repair approaches can iteratively generate patches for each fault position sequentially, their data augmentation methodologies lack rationality and deviate from real-world scenarios. Furthermore, they overlook the interdependencies between faulty statements, leading to patches learned from erroneous contextual patterns. In this article, we propose MuTemAPR, an APR approach that iteratively generates multilocation patches. MuTemAPR incorporates templates with neural machine translation. Specifically, our method introduces three key innovations. First, we design a template-based data augmentation framework that transforms single-line faulty code into multilocation faulty code through 35 mutation templates. It simulates a real-world environment by establishing variable-type mapping tables for more accurate repair augmentation. Second, we propose a reinforced faulty context training method that employs progressive annotation to incrementally learn repair processes from top to bottom in multifault code. Third, we implement a semantic constraint mechanism during training that enforces syntactic and semantic rules through differential analysis between templates, input code, and generated patches. We evaluate MuTemAPR on the widely used Defects4j benchmark. Experimental results demonstrate that our approach can effectively repair multilocation faults, successfully fixing five additional bugs compared with state-of-the-art methods on Defects4j v1.2 and v2.0.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":"2167647X261428016"},"PeriodicalIF":2.6,"publicationDate":"2026-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147476419","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Editorial Summary of Selected Articles. 文章选编摘要。
IF 2.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2026-03-07 DOI: 10.1177/2167647X251406211
Victor Chang, Péter Kacsuk, Gary Wills, Reinhold Behringer
{"title":"Editorial Summary of Selected Articles.","authors":"Victor Chang, Péter Kacsuk, Gary Wills, Reinhold Behringer","doi":"10.1177/2167647X251406211","DOIUrl":"10.1177/2167647X251406211","url":null,"abstract":"","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":"2167647X251406211"},"PeriodicalIF":2.6,"publicationDate":"2026-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145835397","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hybrid DeepSentX Framework for AI-Driven Requirements Insight and Risk Prediction in Multilingual Sports Using Natural Language Processing. 使用自然语言处理的混合DeepSentX框架用于人工智能驱动的需求洞察和多语言体育风险预测。
IF 2.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2026-02-28 DOI: 10.1177/2167647X251399606
Suhas Alalasandra Ramakrishnaiah, Yasir Abdullah Rabi, Ananth John Patrick, Mohammad Shabaz, Surbhi B Khan, Rijwan Khan, Ahlam Almusharraf

Engineering teams need timely signals about evolving requirements and release risk, yet multilingual fan discourse around live sports is noisy, code-switched, and saturated with sarcasm and event-driven drift. We present Hybrid DeepSentX, an AI-driven framework that converts crowd commentary into actionable requirements insight and sprint-level risk scores. The pipeline couples multilingual transformer encoders with an inductive GraphSAGE conversation graph to inject relational context across posts, and adds a reinforcement learner whose reward is shaped to prioritize correct decisions on sarcasm-heavy items and rapidly shifting events. We assembled a million-plus posts from X, Reddit, and sports forums and evaluated the framework against strong baselines, including BERT, long short-term memory, support-vector machines, and recent hybrid models, with significance tests, calibration analysis, ablations, and efficiency profiling. DeepSentX achieved higher macro-averaged accuracy and F1 on code-switched and sarcastic subsets, reduced missed risk flags, and produced developer-facing artefacts that directly support backlog grooming and defect triage. Relative to prior hybrids that combine transformers with either graph reasoning or reinforcement alone, our contributions are fourfold: (i) a unified multilingual design that integrates transformer, graph, and reinforcement components for sarcasm and drift robustness, (ii) an annotated multi-platform corpus with explicit code switching and sarcasm labels and per platform language balance, (iii) a rigorous comparative study reporting accuracy, calibration, latency, memory, and parameter count, and (iv) deployment artefacts that turn model outputs into requirement clusters and sprint risk scores suitable for continuous planning.

工程团队需要关于不断变化的需求和释放风险的及时信号,然而围绕现场体育赛事的多语言粉丝话语是嘈杂的、代码切换的,并且充满了讽刺和事件驱动的漂移。我们提出了Hybrid DeepSentX,这是一个人工智能驱动的框架,可以将人群评论转换为可操作的需求洞察和冲刺级风险评分。该管道将多语言转换器编码器与一个归纳GraphSAGE对话图耦合在一起,以在帖子之间注入关系上下文,并添加了一个强化学习器,其奖励被塑造为优先考虑对讽刺重的项目和快速变化的事件的正确决策。我们从X、Reddit和体育论坛上收集了100多万篇帖子,并根据强大的基线(包括BERT、长短期记忆、支持向量机和最近的混合模型)对框架进行了评估,并进行了显著性测试、校准分析、消耗和效率分析。DeepSentX在代码切换和嘲弄子集上实现了更高的宏观平均精度和F1,减少了错过的风险标志,并产生了面向开发人员的工件,直接支持待办事项整理和缺陷分类。相对于之前将变压器与图形推理或单独强化相结合的混合动力车,我们的贡献有四个方面:(i)统一的多语言设计,集成了用于讽刺和漂移鲁棒性的变压器、图形和强化组件;(ii)带有明确代码切换和讽刺标签以及每个平台语言平衡的注释多平台语料库;(iii)严格的比较研究报告准确性、校准、延迟、内存和参数计数;(iv)部署工件,将模型输出转换为适合连续规划的需求集群和冲刺风险评分。
{"title":"Hybrid DeepSentX Framework for AI-Driven Requirements Insight and Risk Prediction in Multilingual Sports Using Natural Language Processing.","authors":"Suhas Alalasandra Ramakrishnaiah, Yasir Abdullah Rabi, Ananth John Patrick, Mohammad Shabaz, Surbhi B Khan, Rijwan Khan, Ahlam Almusharraf","doi":"10.1177/2167647X251399606","DOIUrl":"10.1177/2167647X251399606","url":null,"abstract":"<p><p>Engineering teams need timely signals about evolving requirements and release risk, yet multilingual fan discourse around live sports is noisy, code-switched, and saturated with sarcasm and event-driven drift. We present Hybrid DeepSentX, an AI-driven framework that converts crowd commentary into actionable requirements insight and sprint-level risk scores. The pipeline couples multilingual transformer encoders with an inductive GraphSAGE conversation graph to inject relational context across posts, and adds a reinforcement learner whose reward is shaped to prioritize correct decisions on sarcasm-heavy items and rapidly shifting events. We assembled a million-plus posts from X, Reddit, and sports forums and evaluated the framework against strong baselines, including BERT, long short-term memory, support-vector machines, and recent hybrid models, with significance tests, calibration analysis, ablations, and efficiency profiling. DeepSentX achieved higher macro-averaged accuracy and F1 on code-switched and sarcastic subsets, reduced missed risk flags, and produced developer-facing artefacts that directly support backlog grooming and defect triage. Relative to prior hybrids that combine transformers with either graph reasoning or reinforcement alone, our contributions are fourfold: (i) a unified multilingual design that integrates transformer, graph, and reinforcement components for sarcasm and drift robustness, (ii) an annotated multi-platform corpus with explicit code switching and sarcasm labels and per platform language balance, (iii) a rigorous comparative study reporting accuracy, calibration, latency, memory, and parameter count, and (iv) deployment artefacts that turn model outputs into requirement clusters and sprint risk scores suitable for continuous planning.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":"2167647X251399606"},"PeriodicalIF":2.6,"publicationDate":"2026-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145702853","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Unified AI Approach Using Encoding and Generative Large Language Models for Variant Product Matching in e-Commerce. 基于编码和生成大语言模型的统一人工智能电子商务产品匹配方法。
IF 2.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2026-02-28 DOI: 10.1177/2167647X261423127
Pedro Herrero-Vidal, You-Lin Chen, Cris Liu, Bin Xu, Prithviraj Sen, Lichao Wang

We introduce VARM, variant relationship matcher strategy, to identify pairs of variant products in e-commerce catalogs. Traditional definitions of entity resolution are concerned with whether product mentions refer to the same underlying product. However, this fails to capture product relationships that are critical for e-commerce applications, such as having similar, but not identical, products listed on the same webpage or share reviews. Here, we formulate a new type of entity resolution in variant product relationships to capture these similar e-commerce product links. In contrast with the traditional definition, the new definition requires both identifying if two products are variant matches of each other and what the attributes are that vary between them. To satisfy these two requirements, we developed a strategy that leverages the strengths of both encoding and generative AI models. First, we construct a dataset that captures webpage product links, and therefore variant product relationships, to train an encoding large language model (LLM) to predict variant matches for any given pair of products. Second, we use retrieval-augmented generation-prompted generative LLMs to extract variation and common attributes amongst groups of variant products. To validate our strategy, we evaluated model performance using real data from one of the world's leading e-commerce retailers. The results showed that our strategy outperforms alternative solutions and paves the way to exploiting these new types of product relationships.

引入变异关系匹配策略(VARM),对电子商务目录中的变异产品进行配对识别。实体解析的传统定义关注产品提及是否指的是相同的底层产品。然而,这未能捕捉到对电子商务应用程序至关重要的产品关系,例如在同一网页上列出相似但不相同的产品或共享评论。在这里,我们在不同的产品关系中制定了一种新的实体解析类型,以捕获这些相似的电子商务产品链接。与传统定义相比,新定义需要识别两个产品是否互为变体匹配,以及它们之间有哪些不同的属性。为了满足这两个需求,我们开发了一种策略,利用编码和生成人工智能模型的优势。首先,我们构建一个捕获网页产品链接的数据集,从而捕获变体产品关系,以训练编码大语言模型(LLM)来预测任何给定产品对的变体匹配。其次,我们使用检索增强生成提示生成llm来提取变异产品组之间的变异和共同属性。为了验证我们的策略,我们使用来自世界领先的电子商务零售商之一的真实数据来评估模型的性能。结果表明,我们的战略优于其他解决方案,并为开发这些新型产品关系铺平了道路。
{"title":"Unified AI Approach Using Encoding and Generative Large Language Models for Variant Product Matching in e-Commerce.","authors":"Pedro Herrero-Vidal, You-Lin Chen, Cris Liu, Bin Xu, Prithviraj Sen, Lichao Wang","doi":"10.1177/2167647X261423127","DOIUrl":"https://doi.org/10.1177/2167647X261423127","url":null,"abstract":"<p><p>We introduce VARM, <i>va</i>riant <i>r</i>elationship <i>m</i>atcher strategy, to identify pairs of variant products in e-commerce catalogs. Traditional definitions of entity resolution are concerned with whether product mentions refer to the same underlying product. However, this fails to capture product relationships that are critical for e-commerce applications, such as having similar, but not identical, products listed on the same webpage or share reviews. Here, we formulate a new type of entity resolution in <i>variant product</i> relationships to capture these similar e-commerce product links. In contrast with the traditional definition, the new definition requires both identifying if two products are variant matches of each other <i>and</i> what the attributes are that vary between them. To satisfy these two requirements, we developed a strategy that leverages the strengths of both encoding and generative AI models. First, we construct a dataset that captures webpage product links, and therefore variant product relationships, to train an encoding large language model (LLM) to predict variant matches for any given pair of products. Second, we use retrieval-augmented generation-prompted generative LLMs to extract variation and common attributes amongst groups of variant products. To validate our strategy, we evaluated model performance using real data from one of the world's leading e-commerce retailers. The results showed that our strategy outperforms alternative solutions and paves the way to exploiting these new types of product relationships.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":"2167647X261423127"},"PeriodicalIF":2.6,"publicationDate":"2026-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147318904","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Advancing Dysarthric Speech-to-Text Recognition with LATTE: A Low-Latency Acoustic Modeling Approach for Real-Time Communication. 用LATTE推进困难语音到文本识别:用于实时通信的低延迟声学建模方法。
IF 2.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2026-02-09 DOI: 10.1177/2167647X251411174
Qurat Ul Ain, Hammad Afzal, Fazli Subhan, Mazliham Mohd Suud, Younhyun Jung

Dysarthria, a motor speech disorder characterized by slurred and often unintelligible speech, presents substantial challenges for effective communication. Conventional automatic speech recognition systems frequently underperform on dysarthric speech, particularly in severe cases. To address this gap, we introduce low-latency acoustic transcription and textual encoding (LATTE), an advanced framework designed for real-time dysarthric speech recognition. LATTE integrates preprocessing, acoustic processing, and transcription mapping into a unified pipeline, with its core powered by a hybrid architecture that combines convolutional layers for acoustic feature extraction with bidirectional temporal layers for modeling temporal dependencies. Evaluated on the UA-Speech dataset, LATTE achieves a word error rate of 12.5%, phoneme error rate of 8.3%, and a character error rate of 1%. By enabling accurate, low-latency transcription of impaired speech, LATTE provides a robust foundation for enhancing communication and accessibility in both digital applications and real-time interactive environments.

构音障碍是一种运动语言障碍,其特征是说话含糊不清,常常难以理解,对有效的沟通提出了重大挑战。传统的自动语音识别系统经常表现不佳,特别是在严重的情况下。为了解决这一差距,我们引入了低延迟声学转录和文本编码(LATTE),这是一种专为实时困难语音识别而设计的高级框架。LATTE将预处理、声学处理和转录映射集成到一个统一的管道中,其核心由混合架构提供动力,该架构结合了用于声学特征提取的卷积层和用于建模时间依赖性的双向时间层。在UA-Speech数据集上进行评估,LATTE的单词错误率为12.5%,音素错误率为8.3%,字符错误率为1%。通过实现对受损语言的准确、低延迟转录,LATTE为增强数字应用程序和实时交互环境中的通信和可访问性提供了坚实的基础。
{"title":"Advancing Dysarthric Speech-to-Text Recognition with LATTE: A Low-Latency Acoustic Modeling Approach for Real-Time Communication.","authors":"Qurat Ul Ain, Hammad Afzal, Fazli Subhan, Mazliham Mohd Suud, Younhyun Jung","doi":"10.1177/2167647X251411174","DOIUrl":"https://doi.org/10.1177/2167647X251411174","url":null,"abstract":"<p><p>Dysarthria, a motor speech disorder characterized by slurred and often unintelligible speech, presents substantial challenges for effective communication. Conventional automatic speech recognition systems frequently underperform on dysarthric speech, particularly in severe cases. To address this gap, we introduce low-latency acoustic transcription and textual encoding (LATTE), an advanced framework designed for real-time dysarthric speech recognition. LATTE integrates preprocessing, acoustic processing, and transcription mapping into a unified pipeline, with its core powered by a hybrid architecture that combines convolutional layers for acoustic feature extraction with bidirectional temporal layers for modeling temporal dependencies. Evaluated on the UA-Speech dataset, LATTE achieves a word error rate of 12.5%, phoneme error rate of 8.3%, and a character error rate of 1%. By enabling accurate, low-latency transcription of impaired speech, LATTE provides a robust foundation for enhancing communication and accessibility in both digital applications and real-time interactive environments.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":"2167647X251411174"},"PeriodicalIF":2.6,"publicationDate":"2026-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146143844","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Real-Time Named Entity Recognition from Textual Electronic Clinical Records in Cancer Therapy Using Low-Latency Neural Networks. 使用低延迟神经网络从文本电子临床记录中实时识别癌症治疗中的命名实体。
IF 2.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2026-02-06 DOI: 10.1177/2167647X251409135
Pir Noman Ahmad, Muhammad Shahid Anwar, Saleha Masood, Atta Ur Rehman, Muhammad Zubair

Named entity recognition (NER) is a core task in natural language processing that identifies and classifies entities, such as people, organizations, and locations within text. It has traditionally been applied in areas like text summarization, machine translation, and question answering. In recent years, NER has gained growing importance in health care, where electronic clinical records and online platforms generate large amounts of unstructured medical data. However, applying NER in clinical contexts introduces unique challenges due to the complexity of medical terminology and the need for high accuracy. In this study, we focused on the development of a real-time, low-latency NER system designed for cross-lingual speech-to-text applications, with a particular emphasis on cancer therapy-related clinical records and traditional Chinese medicine (TCM). We explored the integration of deep learning (DL) architectures optimized for low-latency neural processing to extract structured information from multilingual spoken content in medical settings, particularly in multimodal environments. We evaluate DL-based methods and propose a semi-supervised approach that combines TCM-specific corpora with biomedical resources to improve recognition accuracy. The findings provide both a systematic review of current methods and practical insights for building real-time clinical applications that support decision-making and information management in health care.

命名实体识别(NER)是自然语言处理中的一项核心任务,用于识别和分类文本中的实体,如人员、组织和位置。传统上,它被应用于文本摘要、机器翻译和问答等领域。近年来,NER在医疗保健领域变得越来越重要,电子临床记录和在线平台产生了大量非结构化医疗数据。然而,由于医学术语的复杂性和对高准确性的需求,在临床环境中应用NER引入了独特的挑战。在这项研究中,我们专注于开发一个实时、低延迟的NER系统,该系统专为跨语言语音到文本应用而设计,特别强调与癌症治疗相关的临床记录和中医(TCM)。我们探索了深度学习(DL)架构的集成,该架构针对低延迟神经处理进行了优化,以从医疗环境中的多语言口语内容中提取结构化信息,特别是在多模式环境中。我们评估了基于dl的方法,并提出了一种结合中医特定语料库和生物医学资源的半监督方法,以提高识别精度。研究结果提供了对当前方法的系统回顾和构建实时临床应用的实际见解,支持医疗保健中的决策和信息管理。
{"title":"Real-Time Named Entity Recognition from Textual Electronic Clinical Records in Cancer Therapy Using Low-Latency Neural Networks.","authors":"Pir Noman Ahmad, Muhammad Shahid Anwar, Saleha Masood, Atta Ur Rehman, Muhammad Zubair","doi":"10.1177/2167647X251409135","DOIUrl":"https://doi.org/10.1177/2167647X251409135","url":null,"abstract":"<p><p>Named entity recognition (NER) is a core task in natural language processing that identifies and classifies entities, such as people, organizations, and locations within text. It has traditionally been applied in areas like text summarization, machine translation, and question answering. In recent years, NER has gained growing importance in health care, where electronic clinical records and online platforms generate large amounts of unstructured medical data. However, applying NER in clinical contexts introduces unique challenges due to the complexity of medical terminology and the need for high accuracy. In this study, we focused on the development of a real-time, low-latency NER system designed for cross-lingual speech-to-text applications, with a particular emphasis on cancer therapy-related clinical records and traditional Chinese medicine (TCM). We explored the integration of deep learning (DL) architectures optimized for low-latency neural processing to extract structured information from multilingual spoken content in medical settings, particularly in multimodal environments. We evaluate DL-based methods and propose a semi-supervised approach that combines TCM-specific corpora with biomedical resources to improve recognition accuracy. The findings provide both a systematic review of current methods and practical insights for building real-time clinical applications that support decision-making and information management in health care.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":"2167647X251409135"},"PeriodicalIF":2.6,"publicationDate":"2026-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146127360","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Perceived Usefulness, Trust, and Behavioral Intention: A Study on College Student User Adoption Behaviors of Artificial Intelligence Generated News Based on Technology Acceptance Model. 感知有用性、信任与行为意向:基于技术接受模型的大学生人工智能新闻用户采用行为研究
IF 2.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2026-02-01 Epub Date: 2026-02-09 DOI: 10.1177/2167647X261423109
Xianfeng Gong, Mingyang Mao

This study intends to identify the critical factors that shape college students' adoption of AI-generated news, with a specific focus on integrating Big Data methodologies into the Technology Acceptance Model (TAM) framework. Building on TAM, the research incorporates "trust" as a core variable to develop a dual-path theoretical model that combines technological cognition (e.g., perceived usefulness, perceived ease of use) and psychological emotions. Unlike traditional TAM-based studies relying solely on questionnaire data, this research enriches its data sources by leveraging Big Data techniques-including the collection and analysis of college students' real-time behavioral data (e.g., AI news reading duration, sharing frequency, source verification clicks) and unstructured text data (e.g., sentiment orientation in comment sections)-to complement the survey data from 300 college students. Through a questionnaire survey of 300 college students and data analysis using the structural equation model, the study found that trust has the strongest direct positive impact on the willingness to use (β = 0.49, p < 0.001), and its influence is significantly greater than perceived usefulness (β = 0.35, p < 0.001). Meanwhile, although perceived ease of use does not directly affect the willingness to use, it has significant indirect effects by enhancing trust and perceived usefulness. The results show that in the AI news context with high-risk perception, trust is a more crucial psychological mechanism than traditional technological cognitive factors. These findings have expanded the explanatory boundaries of the TAM model in new technology fields and provided empirical evidence and practical inspiration for AI developers to optimize system credibility and for educators to conduct algorithmic literacy training.

本研究旨在确定影响大学生采用人工智能生成新闻的关键因素,并特别关注将大数据方法整合到技术接受模型(TAM)框架中。本研究以TAM为基础,将“信任”作为核心变量,构建了技术认知(如感知有用性、感知易用性)与心理情绪相结合的双路径理论模型。与传统的基于tam的研究仅仅依赖于问卷数据不同,本研究利用大数据技术——包括收集和分析大学生的实时行为数据(如AI新闻阅读时长、分享频率、来源验证点击)和非结构化文本数据(如评论区情绪倾向)——来丰富其数据源,以补充300名大学生的调查数据。通过对300名大学生的问卷调查,运用结构方程模型进行数据分析,研究发现信任对使用意愿的直接正向影响最强(β = 0.49, p < 0.001),其影响显著大于感知有用性(β = 0.35, p < 0.001)。同时,感知易用性虽然不直接影响使用意愿,但通过增强信任和感知有用性,具有显著的间接影响。结果表明,在具有高风险感知的人工智能新闻情境中,信任是比传统技术认知因素更为关键的心理机制。这些发现拓展了TAM模型在新技术领域的解释边界,为人工智能开发者优化系统可信度和教育工作者开展算法素养培训提供了经验证据和实践启示。
{"title":"Perceived Usefulness, Trust, and Behavioral Intention: A Study on College Student User Adoption Behaviors of Artificial Intelligence Generated News Based on Technology Acceptance Model.","authors":"Xianfeng Gong, Mingyang Mao","doi":"10.1177/2167647X261423109","DOIUrl":"10.1177/2167647X261423109","url":null,"abstract":"<p><p>This study intends to identify the critical factors that shape college students' adoption of AI-generated news, with a specific focus on integrating Big Data methodologies into the Technology Acceptance Model (TAM) framework. Building on TAM, the research incorporates \"trust\" as a core variable to develop a dual-path theoretical model that combines technological cognition (e.g., perceived usefulness, perceived ease of use) and psychological emotions. Unlike traditional TAM-based studies relying solely on questionnaire data, this research enriches its data sources by leveraging Big Data techniques-including the collection and analysis of college students' real-time behavioral data (e.g., AI news reading duration, sharing frequency, source verification clicks) and unstructured text data (e.g., sentiment orientation in comment sections)-to complement the survey data from 300 college students. Through a questionnaire survey of 300 college students and data analysis using the structural equation model, the study found that trust has the strongest direct positive impact on the willingness to use (β = 0.49, <i>p</i> < 0.001), and its influence is significantly greater than perceived usefulness (β = 0.35, <i>p</i> < 0.001). Meanwhile, although perceived ease of use does not directly affect the willingness to use, it has significant indirect effects by enhancing trust and perceived usefulness. The results show that in the AI news context with high-risk perception, trust is a more crucial psychological mechanism than traditional technological cognitive factors. These findings have expanded the explanatory boundaries of the TAM model in new technology fields and provided empirical evidence and practical inspiration for AI developers to optimize system credibility and for educators to conduct algorithmic literacy training.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":"56-61"},"PeriodicalIF":2.6,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146144007","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TL-TransUNet: An Improved Lightweight Semantic Segmentation Model of Macular Edema Lesions in Retinal OCT Images. TL-TransUNet:一种改进的视网膜OCT图像中黄斑水肿病变的轻量级语义分割模型。
IF 2.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2026-02-01 Epub Date: 2026-03-23 DOI: 10.1177/2167647X261429851
Zhijun Gao, Yishuai Yang, Jinhuan Wang, Xin Yue

Optical coherence tomography (OCT) offers significant advantages of noncontact operation, high resolution, and real-time imaging, making it particularly suitable for acquiring human retinal images and playing a crucial role in diagnosing and monitoring retinal diseases such as diabetic macular edema (DME). OCT is a key noninvasive imaging modality for retinal diseases such as DME, offering high-resolution visualization of retinal layers and fluid accumulations. However, retinal fluid segmentation faces several challenges including variations in fluid size, location, and shape, as well as complex irregular boundaries. To address these issues, we propose TL-TransUNet, a novel lightweight segmentation model based on TransUNet. The model incorporates a hybrid self-attention mechanism that effectively combines linear self-attention with residual filtered multilayer perceptron modules, reducing both parameter size and computational complexity while capturing global relationships and local details to improve segmentation performance for small lesions. Furthermore, the decoder employs wavelet convolution that utilizes wavelet transform to extract multi-scale features from low- to high-frequency components, enhancing the model's multi-scale learning capability. Experimental results on a public DME dataset demonstrate that our proposed method outperforms several mainstream segmentation approaches, demonstrating superior performance.

光学相干断层扫描(OCT)具有非接触操作、高分辨率和实时成像的显著优势,特别适合于获取人类视网膜图像,在糖尿病性黄斑水肿(DME)等视网膜疾病的诊断和监测中发挥着至关重要的作用。OCT是视网膜疾病(如DME)的一种关键的无创成像方式,提供视网膜层和液体积聚的高分辨率可视化。然而,视网膜液体分割面临着一些挑战,包括液体大小、位置和形状的变化,以及复杂的不规则边界。为了解决这些问题,我们提出了一种基于TransUNet的新型轻量级分割模型TL-TransUNet。该模型结合了一种混合自注意机制,有效地将线性自注意与残差过滤的多层感知器模块相结合,在捕获全局关系和局部细节的同时减少了参数大小和计算复杂度,从而提高了对小病变的分割性能。此外,解码器采用小波卷积,利用小波变换从低频到高频分量中提取多尺度特征,增强了模型的多尺度学习能力。在公共DME数据集上的实验结果表明,我们提出的方法优于几种主流分割方法,表现出优越的性能。
{"title":"TL-TransUNet: An Improved Lightweight Semantic Segmentation Model of Macular Edema Lesions in Retinal OCT Images.","authors":"Zhijun Gao, Yishuai Yang, Jinhuan Wang, Xin Yue","doi":"10.1177/2167647X261429851","DOIUrl":"https://doi.org/10.1177/2167647X261429851","url":null,"abstract":"<p><p>Optical coherence tomography (OCT) offers significant advantages of noncontact operation, high resolution, and real-time imaging, making it particularly suitable for acquiring human retinal images and playing a crucial role in diagnosing and monitoring retinal diseases such as diabetic macular edema (DME). OCT is a key noninvasive imaging modality for retinal diseases such as DME, offering high-resolution visualization of retinal layers and fluid accumulations. However, retinal fluid segmentation faces several challenges including variations in fluid size, location, and shape, as well as complex irregular boundaries. To address these issues, we propose TL-TransUNet, a novel lightweight segmentation model based on TransUNet. The model incorporates a hybrid self-attention mechanism that effectively combines linear self-attention with residual filtered multilayer perceptron modules, reducing both parameter size and computational complexity while capturing global relationships and local details to improve segmentation performance for small lesions. Furthermore, the decoder employs wavelet convolution that utilizes wavelet transform to extract multi-scale features from low- to high-frequency components, enhancing the model's multi-scale learning capability. Experimental results on a public DME dataset demonstrate that our proposed method outperforms several mainstream segmentation approaches, demonstrating superior performance.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":"14 1","pages":"29-41"},"PeriodicalIF":2.6,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147500742","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Does Context Matter? The Role of Fine-Tuned Contextual Augmentation in Online Ad Delivery on Social Media. 语境重要吗?微调上下文增强在社交媒体在线广告投放中的作用。
IF 2.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2026-02-01 Epub Date: 2025-12-20 DOI: 10.1177/2167647X251398729
Saifullah Jan, Iftikhar Alam, Inayat Khan

This study presents a real-time, context-adaptive advertisement (ad in short) recommendation framework that dynamically updates user context and utilizes a multistage ranking and filtering pipeline to deliver highly relevant and personalized ads. Contextual ads contribute to better conversion rates and play a significant role in e-commerce. In contrast, non-contextual ads engender frustration among advertisers and users: commercialization efforts frequently prove ineffective due to poor user engagement, as evidenced by high ad-skipping rates. The current practices in digital advertising involve non-contextual and irrelevant ads, which result in poor conversion rates. To address this problem, this article explores semantically enriched and context-aware recommender systems, aiming to align ads with user interests. The proposed framework investigates various components, including a user context extractor (UCE), recommender system, ads database, ads ranker, and ads filter. This study also explores how high-quality and relevant content, along with clickable advertising, contributes to improving customer relationships and reducing ad avoidance. During contextual augmentation, ads that become relevant and engaging are projected to have increased click-through rates in a real-world application. Customer engagement and satisfaction would also increase due to a reduction in ad fatigue and the delivery of relevant content. Furthermore, it can curb ad avoidance because users will gladly respond to ads that suit their interests. Businesses make higher conversions because the more relevant recommendation means greater user interaction. The proposed framework combines a UCE, an ad database, a ranking mechanism, and a filtering module to deliver real-time, personalized recommendations. Evaluated using a k-nearest neighbor-based model, the system achieved improved precision (from 0.8275 to 0.9283), recall (from 0.4628 to 0.5201), and normalized discounted cumulative gain (from 0.9906 to 0.9915). These gains demonstrate that integrating fine-grained, dynamic user context substantially enhances recommendation quality and user engagement, offering a scalable foundation for intelligent, adaptive advertising systems. This research contributes toward the future development of an AI-enabled advertising strategy, with an emphasis on dynamic ad targeting that goes hand in hand with personalization and thus improved conversion rate.

本研究提出了一个实时、情境自适应的广告(简称广告)推荐框架,该框架动态更新用户情境,并利用多阶段排名和过滤管道来提供高度相关和个性化的广告。情境广告有助于提高转化率,并在电子商务中发挥重要作用。相比之下,非上下文广告会让广告商和用户感到沮丧:由于用户参与度低,商业化努力经常被证明是无效的,这可以从高广告跳过率中得到证明。目前的数字广告实践涉及非上下文和不相关的广告,这导致了低转化率。为了解决这个问题,本文探讨了语义丰富和上下文感知的推荐系统,旨在使广告与用户兴趣保持一致。该框架研究了各种组件,包括用户上下文提取器(UCE)、推荐系统、广告数据库、广告排名器和广告过滤器。本研究还探讨了高质量和相关的内容,以及可点击的广告,如何有助于改善客户关系和减少广告回避。在上下文增强过程中,具有相关性和吸引力的广告预计会在现实应用中提高点击率。由于广告疲劳的减少和相关内容的传递,客户的参与度和满意度也会增加。此外,它可以抑制广告回避,因为用户会很乐意回应符合他们兴趣的广告。企业的转化率更高,因为更相关的推荐意味着更多的用户互动。该框架结合了UCE、广告数据库、排名机制和过滤模块,以提供实时、个性化的推荐。使用基于k近邻的模型进行评估,系统实现了提高的精度(从0.8275到0.9283),召回率(从0.4628到0.5201)和归一化贴现累积增益(从0.9906到0.9915)。这些成果表明,集成细粒度、动态的用户上下文大大提高了推荐质量和用户参与度,为智能、自适应广告系统提供了可扩展的基础。这项研究有助于人工智能广告策略的未来发展,重点是动态广告定位,与个性化密切相关,从而提高转化率。
{"title":"Does Context Matter? The Role of Fine-Tuned Contextual Augmentation in Online Ad Delivery on Social Media.","authors":"Saifullah Jan, Iftikhar Alam, Inayat Khan","doi":"10.1177/2167647X251398729","DOIUrl":"10.1177/2167647X251398729","url":null,"abstract":"<p><p>This study presents a real-time, context-adaptive advertisement (ad in short) recommendation framework that dynamically updates user context and utilizes a multistage ranking and filtering pipeline to deliver highly relevant and personalized ads. Contextual ads contribute to better conversion rates and play a significant role in e-commerce. In contrast, non-contextual ads engender frustration among advertisers and users: commercialization efforts frequently prove ineffective due to poor user engagement, as evidenced by high ad-skipping rates. The current practices in digital advertising involve non-contextual and irrelevant ads, which result in poor conversion rates. To address this problem, this article explores semantically enriched and context-aware recommender systems, aiming to align ads with user interests. The proposed framework investigates various components, including a user context extractor (UCE), recommender system, ads database, ads ranker, and ads filter. This study also explores how high-quality and relevant content, along with clickable advertising, contributes to improving customer relationships and reducing ad avoidance. During contextual augmentation, ads that become relevant and engaging are projected to have increased click-through rates in a real-world application. Customer engagement and satisfaction would also increase due to a reduction in ad fatigue and the delivery of relevant content. Furthermore, it can curb ad avoidance because users will gladly respond to ads that suit their interests. Businesses make higher conversions because the more relevant recommendation means greater user interaction. The proposed framework combines a UCE, an ad database, a ranking mechanism, and a filtering module to deliver real-time, personalized recommendations. Evaluated using a <i>k</i>-nearest neighbor-based model, the system achieved improved precision (from 0.8275 to 0.9283), recall (from 0.4628 to 0.5201), and normalized discounted cumulative gain (from 0.9906 to 0.9915). These gains demonstrate that integrating fine-grained, dynamic user context substantially enhances recommendation quality and user engagement, offering a scalable foundation for intelligent, adaptive advertising systems. This research contributes toward the future development of an AI-enabled advertising strategy, with an emphasis on dynamic ad targeting that goes hand in hand with personalization and thus improved conversion rate.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":"13-28"},"PeriodicalIF":2.6,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145859048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Big Data
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1