首页 > 最新文献

Machine Learning最新文献

英文 中文
Personalization for web-based services using offline reinforcement learning 利用离线强化学习实现网络服务个性化
IF 7.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-03-28 DOI: 10.1007/s10994-024-06525-y
Pavlos Athanasios Apostolopoulos, Zehui Wang, Hanson Wang, Tenghyu Xu, Chad Zhou, Kittipate Virochsiri, Norm Zhou, Igor L. Markov

Large-scale Web-based services present opportunities for improving UI policies based on observed user interactions. We address challenges of learning such policies through offline reinforcement learning (RL). Deployed in a production system for user authentication in a major social network, it significantly improves long-term objectives. We articulate practical challenges, provide insights on training and evaluation of offline RL, and discuss generalizations toward offline RL’s deployment in industry-scale applications.

基于网络的大规模服务为根据观察到的用户交互情况改进用户界面策略提供了机会。我们通过离线强化学习(RL)来应对学习此类策略的挑战。在一个大型社交网络的用户身份验证生产系统中部署后,该系统显著改善了长期目标。我们阐述了实际挑战,提供了离线强化学习的训练和评估见解,并讨论了在行业规模应用中部署离线强化学习的一般化问题。
{"title":"Personalization for web-based services using offline reinforcement learning","authors":"Pavlos Athanasios Apostolopoulos, Zehui Wang, Hanson Wang, Tenghyu Xu, Chad Zhou, Kittipate Virochsiri, Norm Zhou, Igor L. Markov","doi":"10.1007/s10994-024-06525-y","DOIUrl":"https://doi.org/10.1007/s10994-024-06525-y","url":null,"abstract":"<p>Large-scale Web-based services present opportunities for improving UI policies based on observed user interactions. We address challenges of learning such policies through offline reinforcement learning (RL). Deployed in a production system for user authentication in a major social network, it significantly improves long-term objectives. We articulate practical challenges, provide insights on training and evaluation of offline RL, and discuss generalizations toward offline RL’s deployment in industry-scale applications.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"20 1","pages":""},"PeriodicalIF":7.5,"publicationDate":"2024-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140884612","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Can cross-domain term extraction benefit from cross-lingual transfer and nested term labeling? 跨域术语提取能否受益于跨语言转移和嵌套术语标注?
IF 7.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-03-27 DOI: 10.1007/s10994-023-06506-7
Hanh Thi Hong Tran, Matej Martinc, Andraz Repar, Nikola Ljubešić, Antoine Doucet, Senja Pollak

Automatic term extraction (ATE) is a natural language processing task that eases the effort of manually identifying terms from domain-specific corpora by providing a list of candidate terms. In this paper, we treat ATE as a sequence-labeling task and explore the efficacy of XLMR in evaluating cross-lingual and multilingual learning against monolingual learning in the cross-domain ATE context. Additionally, we introduce NOBI, a novel annotation mechanism enabling the labeling of single-word nested terms. Our experiments are conducted on the ACTER corpus, encompassing four domains and three languages (English, French, and Dutch), as well as the RSDO5 Slovenian corpus, encompassing four additional domains. Results indicate that cross-lingual and multilingual models outperform monolingual settings, showcasing improved F1-scores for all languages within the ACTER dataset. When incorporating an additional Slovenian corpus into the training set, the multilingual model exhibits superior performance compared to state-of-the-art approaches in specific scenarios. Moreover, the newly introduced NOBI labeling mechanism enhances the classifier’s capacity to extract short nested terms significantly, leading to substantial improvements in Recall for the ACTER dataset and consequentially boosting the overall F1-score performance.

自动术语提取(ATE)是一项自然语言处理任务,它通过提供候选术语列表,减轻了从特定领域语料库中手动识别术语的工作量。在本文中,我们将 ATE 视为序列标注任务,并探讨了 XLMR 在跨领域 ATE 中评估跨语言和多语言学习与单语言学习的效果。此外,我们还引入了 NOBI,这是一种新颖的标注机制,可对单词嵌套术语进行标注。我们在 ACTER 语料库(包含四个域和三种语言(英语、法语和荷兰语))以及 RSDO5 斯洛文尼亚语料库(包含另外四个域)上进行了实验。结果表明,跨语言和多语言模型优于单语言设置,ACTER 数据集中所有语言的 F1 分数都有所提高。在将斯洛文尼亚语语料纳入训练集时,多语言模型在特定场景中的表现优于最先进的方法。此外,新引入的 NOBI 标签机制显著增强了分类器提取嵌套短词的能力,从而大幅提高了 ACTER 数据集的召回率,并因此提升了整体 F1 分数性能。
{"title":"Can cross-domain term extraction benefit from cross-lingual transfer and nested term labeling?","authors":"Hanh Thi Hong Tran, Matej Martinc, Andraz Repar, Nikola Ljubešić, Antoine Doucet, Senja Pollak","doi":"10.1007/s10994-023-06506-7","DOIUrl":"https://doi.org/10.1007/s10994-023-06506-7","url":null,"abstract":"<p>Automatic term extraction (ATE) is a natural language processing task that eases the effort of manually identifying terms from domain-specific corpora by providing a list of candidate terms. In this paper, we treat ATE as a sequence-labeling task and explore the efficacy of XLMR in evaluating cross-lingual and multilingual learning against monolingual learning in the cross-domain ATE context. Additionally, we introduce NOBI, a novel annotation mechanism enabling the labeling of single-word nested terms. Our experiments are conducted on the ACTER corpus, encompassing four domains and three languages (English, French, and Dutch), as well as the RSDO5 Slovenian corpus, encompassing four additional domains. Results indicate that cross-lingual and multilingual models outperform monolingual settings, showcasing improved F1-scores for all languages within the ACTER dataset. When incorporating an additional Slovenian corpus into the training set, the multilingual model exhibits superior performance compared to state-of-the-art approaches in specific scenarios. Moreover, the newly introduced NOBI labeling mechanism enhances the classifier’s capacity to extract short nested terms significantly, leading to substantial improvements in Recall for the ACTER dataset and consequentially boosting the overall F1-score performance.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"32 1","pages":""},"PeriodicalIF":7.5,"publicationDate":"2024-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140310898","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Structure discovery in PAC-learning by random projections 通过随机投影发现 PAC-learning 中的结构
IF 7.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-03-26 DOI: 10.1007/s10994-024-06531-0

Abstract

High dimensional learning is data-hungry in general; however, many natural data sources and real-world learning problems posses some hidden low-complexity structure that permit effective learning from relatively small sample sizes. We are interested in the general question of how to discover and exploit such hidden benign traits when problem-specific prior knowledge is insufficient. In this work, we address this question through random projection’s ability to expose structure. We study both compressive learning and high dimensional learning from this angle by introducing the notions of compressive distortion and compressive complexity. We give user-friendly PAC bounds in the agnostic setting that are formulated in terms of these quantities, and we show that our bounds can be tight when these quantities are small. We then instantiate these quantities in several examples of particular learning problems, demonstrating their ability to discover interpretable structural characteristics that make high dimensional instances of these problems solvable to good approximation in a random linear subspace. In the examples considered, these turn out to resemble some familiar benign traits such as the margin, the margin distribution, the intrinsic dimension, the spectral decay of the data covariance, or the norms of parameters—while our general notions of compressive distortion and compressive complexity serve to unify these, and may be used to discover benign structural traits for other PAC-learnable problems.

摘要 高维学习一般都是数据饥渴型的;然而,许多自然数据源和现实世界中的学习问题都具有一些隐藏的低复杂性结构,允许从相对较小的样本量中进行有效学习。我们感兴趣的一般问题是,当特定问题的先验知识不足时,如何发现和利用这种隐藏的良性特征。在这项工作中,我们通过随机投影揭示结构的能力来解决这个问题。通过引入压缩失真和压缩复杂性的概念,我们从这个角度研究了压缩学习和高维学习。我们在不可知论环境中给出了用户友好的 PAC 界值,这些界值是用这些量来表述的。然后,我们在几个特定学习问题的实例中实例化了这些量,展示了它们发现可解释结构特征的能力,这些特征使得这些问题的高维实例可以在随机线性子空间中很好地近似求解。在所考虑的示例中,这些特征类似于我们熟悉的一些良性特征,例如边际、边际分布、内在维度、数据协方差的频谱衰减或参数规范,而我们的压缩失真和压缩复杂性的一般概念有助于统一这些特征,并可用于发现其他 PAC 可学习问题的良性结构特征。
{"title":"Structure discovery in PAC-learning by random projections","authors":"","doi":"10.1007/s10994-024-06531-0","DOIUrl":"https://doi.org/10.1007/s10994-024-06531-0","url":null,"abstract":"<h3>Abstract</h3> <p>High dimensional learning is data-hungry in general; however, many natural data sources and real-world learning problems posses some hidden low-complexity structure that permit effective learning from relatively small sample sizes. We are interested in the general question of how to discover and exploit such hidden benign traits when problem-specific prior knowledge is insufficient. In this work, we address this question through random projection’s ability to expose structure. We study both compressive learning and high dimensional learning from this angle by introducing the notions of compressive distortion and compressive complexity. We give user-friendly PAC bounds in the agnostic setting that are formulated in terms of these quantities, and we show that our bounds can be tight when these quantities are small. We then instantiate these quantities in several examples of particular learning problems, demonstrating their ability to discover interpretable structural characteristics that make high dimensional instances of these problems solvable to good approximation in a random linear subspace. In the examples considered, these turn out to resemble some familiar benign traits such as the margin, the margin distribution, the intrinsic dimension, the spectral decay of the data covariance, or the norms of parameters—while our general notions of compressive distortion and compressive complexity serve to unify these, and may be used to discover benign structural traits for other PAC-learnable problems.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"45 1","pages":""},"PeriodicalIF":7.5,"publicationDate":"2024-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140311133","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
When are they coming? Understanding and forecasting the timeline of arrivals at the FC Barcelona stadium on match days 他们何时到来?了解并预测比赛日抵达巴塞罗那足球俱乐部球场的时间安排
IF 7.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-03-26 DOI: 10.1007/s10994-023-06499-3
Feliu Serra-Burriel, Pedro Delicado, Fernando M. Cucchietti, Eduardo Graells-Garrido, Alex Gil, Imanol Eguskiza

Futbol Club Barcelona operates the largest stadium in Europe (with a seating capacity of almost one hundred thousand people) and manages recurring sports events. These are influenced by multiple conditions (time and day of the week, weather, adversary) and affect city dynamics—e.g., peak demand for related services like public transport and stores. We study fine grain audience entrances at the stadium segregated by visitor type and gate to gain insights and predict the arrival behavior of future games, with a direct impact on the organizational performance and productivity of the business. We can forecast the timeline of arrivals at gate level 72 h prior to kickoff, facilitating operational and organizational decision-making by anticipating potential agglomerations and audience behavior. Furthermore, we can identify patterns for different types of visitors and understand how relevant factors affect them. These findings directly impact commercial and business interests and can alter operational logistics, venue management, and safety.

巴塞罗那足球俱乐部运营着欧洲最大的体育场(可容纳近十万人),并管理着经常性的体育赛事。这些赛事受到多种条件(时间、星期、天气、对手)的影响,并对城市动态产生影响,例如对公共交通和商店等相关服务的高峰需求。我们对体育场的观众入口进行细粒度研究,按观众类型和入口进行分类,以深入了解并预测未来比赛的到达行为,这对企业的组织绩效和生产率有直接影响。我们可以在开球前 72 小时预测入场观众的到达时间,通过预测潜在的聚集和观众行为来促进运营和组织决策。此外,我们还能识别不同类型游客的模式,了解相关因素对他们的影响。这些发现会直接影响商业和企业利益,并能改变运营物流、场地管理和安全。
{"title":"When are they coming? Understanding and forecasting the timeline of arrivals at the FC Barcelona stadium on match days","authors":"Feliu Serra-Burriel, Pedro Delicado, Fernando M. Cucchietti, Eduardo Graells-Garrido, Alex Gil, Imanol Eguskiza","doi":"10.1007/s10994-023-06499-3","DOIUrl":"https://doi.org/10.1007/s10994-023-06499-3","url":null,"abstract":"<p>Futbol Club Barcelona operates the largest stadium in Europe (with a seating capacity of almost one hundred thousand people) and manages recurring sports events. These are influenced by multiple conditions (time and day of the week, weather, adversary) and affect city dynamics—e.g., peak demand for related services like public transport and stores. We study fine grain audience entrances at the stadium segregated by visitor type and gate to gain insights and predict the arrival behavior of future games, with a direct impact on the organizational performance and productivity of the business. We can forecast the timeline of arrivals at gate level 72 h prior to kickoff, facilitating operational and organizational decision-making by anticipating potential agglomerations and audience behavior. Furthermore, we can identify patterns for different types of visitors and understand how relevant factors affect them. These findings directly impact commercial and business interests and can alter operational logistics, venue management, and safety.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"72 1","pages":""},"PeriodicalIF":7.5,"publicationDate":"2024-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140310938","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bounding the Rademacher complexity of Fourier neural operators 限制傅立叶神经算子的拉德马赫复杂性
IF 7.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-03-26 DOI: 10.1007/s10994-024-06533-y
Taeyoung Kim, Myungjoo Kang

Recently, several types of neural operators have been developed, including deep operator networks, graph neural operators, and Multiwavelet-based operators. Compared with these models, the Fourier neural operator (FNO), a physics-inspired machine learning method, is computationally efficient and can learn nonlinear operators between function spaces independent of a certain finite basis. This study investigated the bounding of the Rademacher complexity of the FNO based on specific group norms. Using capacity based on these norms, we bound the generalization error of the model. In addition, we investigate the correlation between the empirical generalization error and the proposed capacity of FNO. We infer that the type of group norm determines the information about the weights and architecture of the FNO model stored in capacity. The experimental results offer insight into the impact of the number of modes used in the FNO model on the generalization error. The results confirm that our capacity is an effective index for estimating generalization errors.

最近,人们开发了多种类型的神经算子,包括深度算子网络、图神经算子和基于多小波的算子。与这些模型相比,傅立叶神经算子(FNO)作为一种受物理学启发的机器学习方法,计算效率高,可以学习独立于一定有限基础的函数空间之间的非线性算子。本研究基于特定的组规范研究了 FNO 的拉德马赫复杂度边界。利用基于这些规范的容量,我们对模型的泛化误差进行了约束。此外,我们还研究了经验泛化误差与 FNO 拟议容量之间的相关性。我们推断,群体规范的类型决定了存储在容量中的 FNO 模型的权重和结构信息。实验结果让我们深入了解了 FNO 模型中使用的模式数量对泛化误差的影响。结果证实,我们的容量是估算泛化误差的有效指标。
{"title":"Bounding the Rademacher complexity of Fourier neural operators","authors":"Taeyoung Kim, Myungjoo Kang","doi":"10.1007/s10994-024-06533-y","DOIUrl":"https://doi.org/10.1007/s10994-024-06533-y","url":null,"abstract":"<p>Recently, several types of neural operators have been developed, including deep operator networks, graph neural operators, and Multiwavelet-based operators. Compared with these models, the Fourier neural operator (FNO), a physics-inspired machine learning method, is computationally efficient and can learn nonlinear operators between function spaces independent of a certain finite basis. This study investigated the bounding of the Rademacher complexity of the FNO based on specific group norms. Using capacity based on these norms, we bound the generalization error of the model. In addition, we investigate the correlation between the empirical generalization error and the proposed capacity of FNO. We infer that the type of group norm determines the information about the weights and architecture of the FNO model stored in capacity. The experimental results offer insight into the impact of the number of modes used in the FNO model on the generalization error. The results confirm that our capacity is an effective index for estimating generalization errors.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"42 1","pages":""},"PeriodicalIF":7.5,"publicationDate":"2024-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140316731","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Ijuice: integer JUstIfied counterfactual explanations Ijuice:整数化的反事实解释
IF 7.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-03-26 DOI: 10.1007/s10994-024-06530-1
Alejandro Kuratomi, Ioanna Miliou, Zed Lee, Tony Lindgren, Panagiotis Papapetrou

Counterfactual explanations modify the feature values of an instance in order to alter its prediction from an undesired to a desired label. As such, they are highly useful for providing trustworthy interpretations of decision-making in domains where complex and opaque machine learning algorithms are utilized. To guarantee their quality and promote user trust, they need to satisfy the faithfulness desideratum, when supported by the data distribution. We hereby propose a counterfactual generation algorithm for mixed-feature spaces that prioritizes faithfulness through k-justification, a novel counterfactual property introduced in this paper. The proposed algorithm employs a graph representation of the search space and provides counterfactuals by solving an integer program. In addition, the algorithm is classifier-agnostic and is not dependent on the order in which the feature space is explored. In our empirical evaluation, we demonstrate that it guarantees k-justification while showing comparable performance to state-of-the-art methods in feasibility, sparsity, and proximity.

反事实解释可以修改实例的特征值,从而将其预测从不佳标签变为理想标签。因此,在使用复杂而不透明的机器学习算法的领域中,反事实解释对于提供可信的决策解释非常有用。为了保证其质量并提高用户信任度,它们需要在数据分布的支持下满足忠实性要求。在此,我们提出了一种混合特征空间的反事实生成算法,该算法通过 k-justification 优先考虑忠实性,这是本文引入的一种新颖的反事实属性。本文提出的算法采用搜索空间的图表示法,通过求解整数程序来提供反事实。此外,该算法与分类器无关,也不依赖于探索特征空间的顺序。在实证评估中,我们证明了该算法在可行性、稀疏性和接近性方面与最先进的方法性能相当,同时还保证了 k 的合理性。
{"title":"Ijuice: integer JUstIfied counterfactual explanations","authors":"Alejandro Kuratomi, Ioanna Miliou, Zed Lee, Tony Lindgren, Panagiotis Papapetrou","doi":"10.1007/s10994-024-06530-1","DOIUrl":"https://doi.org/10.1007/s10994-024-06530-1","url":null,"abstract":"<p>Counterfactual explanations modify the feature values of an instance in order to alter its prediction from an undesired to a desired label. As such, they are highly useful for providing trustworthy interpretations of decision-making in domains where complex and opaque machine learning algorithms are utilized. To guarantee their quality and promote user trust, they need to satisfy the <i>faithfulness</i> desideratum, when supported by the data distribution. We hereby propose a counterfactual generation algorithm for mixed-feature spaces that prioritizes faithfulness through <i>k-justification</i>, a novel counterfactual property introduced in this paper. The proposed algorithm employs a graph representation of the search space and provides counterfactuals by solving an integer program. In addition, the algorithm is classifier-agnostic and is not dependent on the order in which the feature space is explored. In our empirical evaluation, we demonstrate that it guarantees k-justification while showing comparable performance to state-of-the-art methods in <i>feasibility</i>, <i>sparsity</i>, and <i>proximity</i>.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"47 1","pages":""},"PeriodicalIF":7.5,"publicationDate":"2024-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140311335","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Gradient boosted trees for evolving data streams 用于演化数据流的梯度提升树
IF 7.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-03-22 DOI: 10.1007/s10994-024-06517-y
Nuwan Gunasekara, Bernhard Pfahringer, Heitor Gomes, Albert Bifet

Gradient Boosting is a widely-used machine learning technique that has proven highly effective in batch learning. However, its effectiveness in stream learning contexts lags behind bagging-based ensemble methods, which currently dominate the field. One reason for this discrepancy is the challenge of adapting the booster to new concept following a concept drift. Resetting the entire booster can lead to significant performance degradation as it struggles to learn the new concept. Resetting only some parts of the booster can be more effective, but identifying which parts to reset is difficult, given that each boosting step builds on the previous prediction. To overcome these difficulties, we propose Streaming Gradient Boosted Trees (Sgbt), which is trained using weighted squared loss elicited in XGBoost. Sgbt exploits trees with a replacement strategy to detect and recover from drifts, thus enabling the ensemble to adapt without sacrificing the predictive performance. Our empirical evaluation of Sgbt on a range of streaming datasets with challenging drift scenarios demonstrates that it outperforms current state-of-the-art methods for evolving data streams.

梯度提升(Gradient Boosting)是一种广泛使用的机器学习技术,已被证明在批量学习中非常有效。然而,它在流学习环境中的有效性却落后于基于袋法的集合方法,而后者目前在该领域占据主导地位。造成这种差异的原因之一是,在概念漂移之后,如何使助推器适应新概念是一个挑战。重置整个助推器会导致性能显著下降,因为它要努力学习新概念。只重置助推器的某些部分可能会更有效,但由于每个助推步骤都建立在前一个预测的基础上,因此很难确定要重置哪些部分。为了克服这些困难,我们提出了流梯度提升树(Sgbt),它是利用 XGBoost 中引出的加权平方损失进行训练的。Sgbt 利用具有替换策略的树来检测和恢复漂移,从而使集合能够在不牺牲预测性能的情况下进行调整。我们在一系列具有挑战性漂移场景的流数据集上对 Sgbt 进行了实证评估,结果表明它优于当前最先进的数据流演化方法。
{"title":"Gradient boosted trees for evolving data streams","authors":"Nuwan Gunasekara, Bernhard Pfahringer, Heitor Gomes, Albert Bifet","doi":"10.1007/s10994-024-06517-y","DOIUrl":"https://doi.org/10.1007/s10994-024-06517-y","url":null,"abstract":"<p>Gradient Boosting is a widely-used machine learning technique that has proven highly effective in batch learning. However, its effectiveness in stream learning contexts lags behind bagging-based ensemble methods, which currently dominate the field. One reason for this discrepancy is the challenge of adapting the booster to new concept following a concept drift. Resetting the entire booster can lead to significant performance degradation as it struggles to learn the new concept. Resetting only some parts of the booster can be more effective, but identifying which parts to reset is difficult, given that each boosting step builds on the previous prediction. To overcome these difficulties, we propose Streaming Gradient Boosted Trees (<span>Sgbt</span>), which is trained using weighted squared loss elicited in <span>XGBoost</span>. <span>Sgbt</span> exploits trees with a replacement strategy to detect and recover from drifts, thus enabling the ensemble to adapt without sacrificing the predictive performance. Our empirical evaluation of <span>Sgbt</span> on a range of streaming datasets with challenging drift scenarios demonstrates that it outperforms current state-of-the-art methods for evolving data streams.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"25 1","pages":""},"PeriodicalIF":7.5,"publicationDate":"2024-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140205735","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Optimal clustering from noisy binary feedback 从噪声二进制反馈中优化聚类
IF 7.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-03-22 DOI: 10.1007/s10994-024-06532-z

Abstract

We study the problem of clustering a set of items from binary user feedback. Such a problem arises in crowdsourcing platforms solving large-scale labeling tasks with minimal effort put on the users. For example, in some of the recent reCAPTCHA systems, users clicks (binary answers) can be used to efficiently label images. In our inference problem, items are grouped into initially unknown non-overlapping clusters. To recover these clusters, the learner sequentially presents to users a finite list of items together with a question with a binary answer selected from a fixed finite set. For each of these items, the user provides a noisy answer whose expectation is determined by the item cluster and the question and by an item-specific parameter characterizing the hardness of classifying the item. The objective is to devise an algorithm with a minimal cluster recovery error rate. We derive problem-specific information-theoretical lower bounds on the error rate satisfied by any algorithm, for both uniform and adaptive (list, question) selection strategies. For uniform selection, we present a simple algorithm built upon the K-means algorithm and whose performance almost matches the fundamental limits. For adaptive selection, we develop an adaptive algorithm that is inspired by the derivation of the information-theoretical error lower bounds, and in turn allocates the budget in an efficient way. The algorithm learns to select items hard to cluster and relevant questions more often. We compare the performance of our algorithms with or without the adaptive selection strategy numerically and illustrate the gain achieved by being adaptive.

摘要 我们研究了根据二进制用户反馈对一组项目进行聚类的问题。这种问题出现在众包平台上,用户只需付出最小的努力就能解决大规模的标记任务。例如,在最近的一些 reCAPTCHA 系统中,用户的点击(二进制答案)可以用来有效地标记图像。在我们的推理问题中,项目被归入最初未知的非重叠群组。为了恢复这些群集,学习者会依次向用户展示一个有限的项目列表,以及一个从固定的有限集合中选出的带有二进制答案的问题。对于每个项目,用户都会提供一个噪声答案,其期望值由项目群和问题以及表征项目分类难易程度的特定项目参数决定。我们的目标是设计一种具有最小群组恢复错误率的算法。我们针对统一和自适应(列表、问题)选择策略,推导出了任何算法所满足的错误率的特定问题信息理论下限。对于统一选择,我们提出了一种建立在 K-means 算法基础上的简单算法,其性能几乎与基本限制相匹配。对于自适应选择,我们开发了一种自适应算法,该算法受到信息论误差下限推导的启发,进而以一种有效的方式分配预算。该算法学会更频繁地选择难以聚类的项目和相关问题。我们用数字比较了有无自适应选择策略的算法性能,并说明了自适应所带来的收益。
{"title":"Optimal clustering from noisy binary feedback","authors":"","doi":"10.1007/s10994-024-06532-z","DOIUrl":"https://doi.org/10.1007/s10994-024-06532-z","url":null,"abstract":"<h3>Abstract</h3> <p>We study the problem of clustering a set of items from binary user feedback. Such a problem arises in crowdsourcing platforms solving large-scale labeling tasks with minimal effort put on the users. For example, in some of the recent reCAPTCHA systems, users clicks (binary answers) can be used to efficiently label images. In our inference problem, items are grouped into initially unknown non-overlapping clusters. To recover these clusters, the learner sequentially presents to users a finite list of items together with a question with a binary answer selected from a fixed finite set. For each of these items, the user provides a noisy answer whose expectation is determined by the item cluster and the question and by an item-specific parameter characterizing the <em>hardness</em> of classifying the item. The objective is to devise an algorithm with a minimal cluster recovery error rate. We derive problem-specific information-theoretical lower bounds on the error rate satisfied by any algorithm, for both uniform and adaptive (list, question) selection strategies. For uniform selection, we present a simple algorithm built upon the K-means algorithm and whose performance almost matches the fundamental limits. For adaptive selection, we develop an adaptive algorithm that is inspired by the derivation of the information-theoretical error lower bounds, and in turn allocates the budget in an efficient way. The algorithm learns to select items hard to cluster and relevant questions more often. We compare the performance of our algorithms with or without the adaptive selection strategy numerically and illustrate the gain achieved by being adaptive.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"24 1","pages":""},"PeriodicalIF":7.5,"publicationDate":"2024-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140204495","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TOCOL: improving contextual representation of pre-trained language models via token-level contrastive learning TOCOL:通过标记级对比学习改进预训练语言模型的语境表征
IF 7.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-03-18 DOI: 10.1007/s10994-023-06512-9
Keheng Wang, Chuantao Yin, Rumei Li, Sirui Wang, Yunsen Xian, Wenge Rong, Zhang Xiong

Self-attention, which allows transformers to capture deep bidirectional contexts, plays a vital role in BERT-like pre-trained language models. However, the maximum likelihood pre-training objective of BERT may produce an anisotropic word embedding space, which leads to biased attention scores for high-frequency tokens, as they are very close to each other in representation space and thus have higher similarities. This bias may ultimately affect the encoding of global contextual information. To address this issue, we propose TOCOL, a TOken-Level COntrastive Learning framework for improving the contextual representation of pre-trained language models, which integrates a novel self-supervised objective to the attention mechanism to reshape the word representation space and encourages PLM to capture the global semantics of sentences. Results on the GLUE Benchmark show that TOCOL brings considerable improvement over the original BERT. Furthermore, we conduct a detailed analysis and demonstrate the robustness of our approach for low-resource scenarios.

自我关注允许转换器捕捉深度双向语境,在类似 BERT 的预训练语言模型中发挥着重要作用。然而,BERT 的最大似然预训练目标可能会产生一个各向异性的单词嵌入空间,从而导致高频词块的注意力得分出现偏差,因为这些词块在表征空间中彼此非常接近,因此具有较高的相似性。这种偏差最终可能会影响全局语境信息的编码。为了解决这个问题,我们提出了 TOCOL(一种用于改进预训练语言模型上下文表征的词级对比学习框架),它将一种新颖的自监督目标整合到注意力机制中,以重塑词的表征空间,并鼓励 PLM 捕捉句子的全局语义。GLUE 基准测试结果表明,TOCOL 比原来的 BERT 有了很大的改进。此外,我们还进行了详细分析,证明了我们的方法在低资源场景下的鲁棒性。
{"title":"TOCOL: improving contextual representation of pre-trained language models via token-level contrastive learning","authors":"Keheng Wang, Chuantao Yin, Rumei Li, Sirui Wang, Yunsen Xian, Wenge Rong, Zhang Xiong","doi":"10.1007/s10994-023-06512-9","DOIUrl":"https://doi.org/10.1007/s10994-023-06512-9","url":null,"abstract":"<p>Self-attention, which allows transformers to capture deep bidirectional contexts, plays a vital role in BERT-like pre-trained language models. However, the maximum likelihood pre-training objective of BERT may produce an anisotropic word embedding space, which leads to biased attention scores for high-frequency tokens, as they are very close to each other in representation space and thus have higher similarities. This bias may ultimately affect the encoding of global contextual information. To address this issue, we propose TOCOL, a <b>TO</b>ken-Level <b>CO</b>ntrastive <b>L</b>earning framework for improving the contextual representation of pre-trained language models, which integrates a novel self-supervised objective to the attention mechanism to reshape the word representation space and encourages PLM to capture the global semantics of sentences. Results on the GLUE Benchmark show that TOCOL brings considerable improvement over the original BERT. Furthermore, we conduct a detailed analysis and demonstrate the robustness of our approach for low-resource scenarios.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"9 1","pages":""},"PeriodicalIF":7.5,"publicationDate":"2024-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140168796","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Stress detection with encoding physiological signals and convolutional neural network 利用生理信号编码和卷积神经网络进行压力检测
IF 7.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-03-15 DOI: 10.1007/s10994-023-06509-4
Michela Quadrini, Antonino Capuccio, Denise Falcone, Sebastian Daberdaku, Alessandro Blanda, Luca Bellanova, Gianluca Gerard

Stress is a significant and growing phenomenon in the modern world that leads to numerous health problems. Robust and non-invasive method developments for early and accurate stress detection are crucial in enhancing people’s quality of life. Previous researches show that using machine learning approaches on physiological signals is a reliable stress predictor by achieving significant results. However, it requires determining features by hand. Such a selection is a challenge in this context since stress determines nonspecific human responses. This work overcomes such limitations by considering STREDWES, an approach for Stress Detection from Wearable Sensors Data. STREDWES encodes signal fragments of physiological signals into images and classifies them by a Convolutional Neural Network (CNN). This study aims to study several encoding methods, including the Gramian Angular Summation/Difference Field method and Markov Transition Field, to evaluate the best way to encode signals into images in this domain. Such a study is performed on the NEURO dataset. Moreover, we investigate the usefulness of STREDWES in real scenarios by considering the SWELL dataset and a personalized approach. Finally, we compare the proposed approach with its competitors by considering the WESAD dataset. It outperforms the others.

压力是现代社会中一个重要且日益增长的现象,会导致许多健康问题。开发可靠的非侵入性方法,用于早期准确检测压力,对于提高人们的生活质量至关重要。以往的研究表明,在生理信号上使用机器学习方法是一种可靠的压力预测方法,能取得显著效果。然而,这需要人工确定特征。在这种情况下,这种选择是一个挑战,因为压力决定了人类的非特异性反应。STREDWES 是一种从可穿戴传感器数据中进行压力检测的方法,这项研究通过考虑 STREDWES 克服了上述局限性。STREDWES 将生理信号的信号片段编码成图像,并通过卷积神经网络(CNN)对其进行分类。本研究旨在研究几种编码方法,包括格拉米安角相加/差分场法和马尔可夫转换场法,以评估在该领域将信号编码成图像的最佳方法。这项研究是在 NEURO 数据集上进行的。此外,我们还通过考虑 SWELL 数据集和个性化方法,研究了 STREDWES 在实际场景中的实用性。最后,我们通过 WESAD 数据集将所提出的方法与其竞争对手进行了比较。结果显示,该方法优于其他方法。
{"title":"Stress detection with encoding physiological signals and convolutional neural network","authors":"Michela Quadrini, Antonino Capuccio, Denise Falcone, Sebastian Daberdaku, Alessandro Blanda, Luca Bellanova, Gianluca Gerard","doi":"10.1007/s10994-023-06509-4","DOIUrl":"https://doi.org/10.1007/s10994-023-06509-4","url":null,"abstract":"<p>Stress is a significant and growing phenomenon in the modern world that leads to numerous health problems. Robust and non-invasive method developments for early and accurate stress detection are crucial in enhancing people’s quality of life. Previous researches show that using machine learning approaches on physiological signals is a reliable stress predictor by achieving significant results. However, it requires determining features by hand. Such a selection is a challenge in this context since stress determines nonspecific human responses. This work overcomes such limitations by considering STREDWES, an approach for Stress Detection from Wearable Sensors Data. STREDWES encodes signal fragments of physiological signals into images and classifies them by a Convolutional Neural Network (CNN). This study aims to study several encoding methods, including the Gramian Angular Summation/Difference Field method and Markov Transition Field, to evaluate the best way to encode signals into images in this domain. Such a study is performed on the NEURO dataset. Moreover, we investigate the usefulness of STREDWES in real scenarios by considering the SWELL dataset and a personalized approach. Finally, we compare the proposed approach with its competitors by considering the WESAD dataset. It outperforms the others.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"8 1","pages":""},"PeriodicalIF":7.5,"publicationDate":"2024-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140152360","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Machine Learning
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1