arXiv - CS - Machine Learning最新文献_第4页

Machine Learning for Public Good: Predicting Urban Crime Patterns to Enhance Community Safety 机器学习促进公益：预测城市犯罪模式以加强社区安全

arXiv - CS - Machine Learning

Pub Date : 2024-09-17 DOI: arxiv-2409.10838

Sia Gupta, Simeon Sayer

In recent years, urban safety has become a paramount concern for cityplanners and law enforcement agencies. Accurate prediction of likely crimeoccurrences can significantly enhance preventive measures and resourceallocation. However, many law enforcement departments lack the tools to analyzeand apply advanced AI and ML techniques that can support city planners, watchprograms, and safety leaders to take proactive steps towards overall communitysafety. This paper explores the effectiveness of ML techniques to predict spatial andtemporal patterns of crimes in urban areas. Leveraging police dispatch calldata from San Jose, CA, the research goal is to achieve a high degree ofaccuracy in categorizing calls into priority levels particularly for moredangerous situations that require an immediate law enforcement response. Thiscategorization is informed by the time, place, and nature of the call. Theresearch steps include data extraction, preprocessing, feature engineering,exploratory data analysis, implementation, optimization and tuning of differentsupervised machine learning models and neural networks. The accuracy andprecision are examined for different models and features at varying granularityof crime categories and location precision. The results demonstrate that when compared to a variety of other models,Random Forest classification models are most effective in identifying dangeroussituations and their corresponding priority levels with high accuracy (Accuracy= 85%, AUC = 0.92) at a local level while ensuring a minimum amount of falsenegatives. While further research and data gathering is needed to include othersocial and economic factors, these results provide valuable insights for lawenforcement agencies to optimize resources, develop proactive deploymentapproaches, and adjust response patterns to enhance overall public safetyoutcomes in an unbiased way.

近年来，城市安全已成为城市规划者和执法机构最为关注的问题。对可能发生的犯罪进行准确预测可以大大加强预防措施和资源分配。然而，许多执法部门缺乏分析和应用先进人工智能和 ML 技术的工具，而这些技术可以支持城市规划者、监视计划和安全领导者采取积极措施，以实现整体社区安全。本文探讨了 ML 技术在预测城市地区犯罪的空间和时间模式方面的有效性。利用来自加利福尼亚州圣何塞的警方调度呼叫数据，研究目标是实现高度准确的呼叫优先级分类，尤其是针对需要立即执法响应的更危险情况。这种分类是根据呼叫的时间、地点和性质进行的。研究步骤包括数据提取、预处理、特征工程、探索性数据分析、不同监督机器学习模型和神经网络的实施、优化和调整。在不同的犯罪类别粒度和位置精度下，对不同模型和特征的准确性和精确度进行了检验。结果表明，与其他各种模型相比，随机森林分类模型在识别危险情况及其相应的优先级别方面最为有效，而且准确率较高（准确率= 85%，AUC = 0.92），同时确保将虚假负值降到最低。虽然还需要进一步的研究和数据收集，以纳入其他社会和经济因素，但这些结果为执法机构优化资源、制定前瞻性部署方法和调整响应模式提供了宝贵的见解，从而以公正的方式提高整体公共安全成果。

{"title":"Machine Learning for Public Good: Predicting Urban Crime Patterns to Enhance Community Safety","authors":"Sia Gupta, Simeon Sayer","doi":"arxiv-2409.10838","DOIUrl":"https://doi.org/arxiv-2409.10838","url":null,"abstract":"In recent years, urban safety has become a paramount concern for city\u0000planners and law enforcement agencies. Accurate prediction of likely crime\u0000occurrences can significantly enhance preventive measures and resource\u0000allocation. However, many law enforcement departments lack the tools to analyze\u0000and apply advanced AI and ML techniques that can support city planners, watch\u0000programs, and safety leaders to take proactive steps towards overall community\u0000safety. This paper explores the effectiveness of ML techniques to predict spatial and\u0000temporal patterns of crimes in urban areas. Leveraging police dispatch call\u0000data from San Jose, CA, the research goal is to achieve a high degree of\u0000accuracy in categorizing calls into priority levels particularly for more\u0000dangerous situations that require an immediate law enforcement response. This\u0000categorization is informed by the time, place, and nature of the call. The\u0000research steps include data extraction, preprocessing, feature engineering,\u0000exploratory data analysis, implementation, optimization and tuning of different\u0000supervised machine learning models and neural networks. The accuracy and\u0000precision are examined for different models and features at varying granularity\u0000of crime categories and location precision. The results demonstrate that when compared to a variety of other models,\u0000Random Forest classification models are most effective in identifying dangerous\u0000situations and their corresponding priority levels with high accuracy (Accuracy\u0000= 85%, AUC = 0.92) at a local level while ensuring a minimum amount of false\u0000negatives. While further research and data gathering is needed to include other\u0000social and economic factors, these results provide valuable insights for law\u0000enforcement agencies to optimize resources, develop proactive deployment\u0000approaches, and adjust response patterns to enhance overall public safety\u0000outcomes in an unbiased way.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":"205 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261898","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Towards Time Series Reasoning with LLMs 利用 LLM 实现时间序列推理

arXiv - CS - Machine Learning

Pub Date : 2024-09-17 DOI: arxiv-2409.11376

Winnie Chow, Lauren Gardiner, Haraldur T. Hallgrímsson, Maxwell A. Xu, Shirley You Ren

Multi-modal large language models (MLLMs) have enabled numerous advances inunderstanding and reasoning in domains like vision, but we have not yet seenthis broad success for time-series. Although prior works on time-series MLLMshave shown promising performance in time-series forecasting, very few worksshow how an LLM could be used for time-series reasoning in natural language. Wepropose a novel multi-modal time-series LLM approach that learns generalizableinformation across various domains with powerful zero-shot performance. First,we train a lightweight time-series encoder on top of an LLM to directly extracttime-series information. Then, we fine-tune our model with chain-of-thoughtaugmented time-series tasks to encourage the model to generate reasoning paths.We show that our model learns a latent representation that reflects specifictime-series features (e.g. slope, frequency), as well as outperforming GPT-4oon a set of zero-shot reasoning tasks on a variety of domains.

多模态大型语言模型（MLLM）在视觉等领域的理解和推理方面取得了许多进步，但我们还没有在时间序列方面看到如此广泛的成功。虽然之前关于时间序列 MLLM 的研究已经在时间序列预测方面取得了可喜的成绩，但很少有研究表明 LLM 如何用于自然语言的时间序列推理。我们提出了一种新颖的多模态时间序列 LLM 方法，它可以学习各种领域的通用信息，并具有强大的零点性能。首先，我们在 LLM 的基础上训练了一个轻量级时间序列编码器，以直接提取时间序列信息。我们的研究表明，我们的模型可以学习到反映特定时间序列特征（如斜率、频率）的潜在表征，并且在多个领域的一组零点推理任务中表现优于 GPT-4。

引用次数: 0

Time-Series Forecasting, Knowledge Distillation, and Refinement within a Multimodal PDE Foundation Model 多模态 PDE 基础模型中的时间序列预测、知识提炼和完善

arXiv - CS - Machine Learning

Pub Date : 2024-09-17 DOI: arxiv-2409.11609

Derek Jollie, Jingmin Sun, Zecheng Zhang, Hayden Schaeffer

Symbolic encoding has been used in multi-operator learning as a way to embedadditional information for distinct time-series data. For spatiotemporalsystems described by time-dependent partial differential equations, theequation itself provides an additional modality to identify the system. Theutilization of symbolic expressions along side time-series samples allows forthe development of multimodal predictive neural networks. A key challenge withcurrent approaches is that the symbolic information, i.e. the equations, mustbe manually preprocessed (simplified, rearranged, etc.) to match and relate tothe existing token library, which increases costs and reduces flexibility,especially when dealing with new differential equations. We propose a new tokenlibrary based on SymPy to encode differential equations as an additionalmodality for time-series models. The proposed approach incurs minimal cost, isautomated, and maintains high prediction accuracy for forecasting tasks.Additionally, we include a Bayesian filtering module that connects thedifferent modalities to refine the learned equation. This improves the accuracyof the learned symbolic representation and the predicted time-series.

符号编码已被用于多运算器学习中，作为一种为不同时间序列数据嵌入附加信息的方法。对于由随时间变化的偏微分方程描述的时空系统，方程本身提供了识别系统的额外模式。利用符号表达式和时间序列样本可以开发多模态预测神经网络。当前方法面临的一个主要挑战是，必须对符号信息（即方程）进行人工预处理（简化、重新排列等），以便与现有标记库匹配和关联，这增加了成本，降低了灵活性，尤其是在处理新的微分方程时。我们提出了一种基于 SymPy 的新标记库，用于编码微分方程，作为时间序列模型的附加模式。此外，我们还包含一个贝叶斯过滤模块，它可以连接不同的模态来完善所学方程。这提高了所学符号表示和预测时间序列的准确性。

{"title":"Time-Series Forecasting, Knowledge Distillation, and Refinement within a Multimodal PDE Foundation Model","authors":"Derek Jollie, Jingmin Sun, Zecheng Zhang, Hayden Schaeffer","doi":"arxiv-2409.11609","DOIUrl":"https://doi.org/arxiv-2409.11609","url":null,"abstract":"Symbolic encoding has been used in multi-operator learning as a way to embed\u0000additional information for distinct time-series data. For spatiotemporal\u0000systems described by time-dependent partial differential equations, the\u0000equation itself provides an additional modality to identify the system. The\u0000utilization of symbolic expressions along side time-series samples allows for\u0000the development of multimodal predictive neural networks. A key challenge with\u0000current approaches is that the symbolic information, i.e. the equations, must\u0000be manually preprocessed (simplified, rearranged, etc.) to match and relate to\u0000the existing token library, which increases costs and reduces flexibility,\u0000especially when dealing with new differential equations. We propose a new token\u0000library based on SymPy to encode differential equations as an additional\u0000modality for time-series models. The proposed approach incurs minimal cost, is\u0000automated, and maintains high prediction accuracy for forecasting tasks.\u0000Additionally, we include a Bayesian filtering module that connects the\u0000different modalities to refine the learned equation. This improves the accuracy\u0000of the learned symbolic representation and the predicted time-series.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":"94 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261958","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Fair Anomaly Detection For Imbalanced Groups 针对不平衡群体的公平异常检测

arXiv - CS - Machine Learning

Pub Date : 2024-09-17 DOI: arxiv-2409.10951

Ziwei Wu, Lecheng Zheng, Yuancheng Yu, Ruizhong Qiu, John Birge, Jingrui He

Anomaly detection (AD) has been widely studied for decades in many real-worldapplications, including fraud detection in finance, and intrusion detection forcybersecurity, etc. Due to the imbalanced nature between protected andunprotected groups and the imbalanced distributions of normal examples andanomalies, the learning objectives of most existing anomaly detection methodstend to solely concentrate on the dominating unprotected group. Thus, it hasbeen recognized by many researchers about the significance of ensuring modelfairness in anomaly detection. However, the existing fair anomaly detectionmethods tend to erroneously label most normal examples from the protected groupas anomalies in the imbalanced scenario where the unprotected group is moreabundant than the protected group. This phenomenon is caused by the improperdesign of learning objectives, which statistically focus on learning thefrequent patterns (i.e., the unprotected group) while overlooking theunder-represented patterns (i.e., the protected group). To address theseissues, we propose FairAD, a fairness-aware anomaly detection method targetingthe imbalanced scenario. It consists of a fairness-aware contrastive learningmodule and a rebalancing autoencoder module to ensure fairness and handle theimbalanced data issue, respectively. Moreover, we provide the theoreticalanalysis that shows our proposed contrastive learning regularization guaranteesgroup fairness. Empirical studies demonstrate the effectiveness and efficiencyof FairAD across multiple real-world datasets.

几十年来，异常检测（AD）已在许多实际应用中得到广泛研究，包括金融欺诈检测、网络安全入侵检测等。由于受保护组和非受保护组之间的不平衡性，以及正常示例和异常分布的不平衡性，大多数现有异常检测方法的学习目标往往只集中在占主导地位的非受保护组上。因此，许多研究人员已经认识到确保模型公平性在异常检测中的重要性。然而，现有的公平异常检测方法在未受保护组比受保护组多的不平衡场景中，往往会错误地将来自受保护组的大多数正常示例标记为异常。造成这种现象的原因是学习目标设计不当，在统计上只关注学习经常出现的模式（即未受保护组），而忽略了代表性不足的模式（即受保护组）。为了解决这些问题，我们提出了一种公平感知异常检测方法--FairAD，它主要针对不平衡场景。它由公平感知对比学习模块和再平衡自动编码器模块组成，分别用于确保公平性和处理不平衡数据问题。此外，我们提供的理论分析表明，我们提出的对比学习正则化可以保证组的公平性。实证研究证明了 FairAD 在多个实际数据集上的有效性和效率。

{"title":"Fair Anomaly Detection For Imbalanced Groups","authors":"Ziwei Wu, Lecheng Zheng, Yuancheng Yu, Ruizhong Qiu, John Birge, Jingrui He","doi":"arxiv-2409.10951","DOIUrl":"https://doi.org/arxiv-2409.10951","url":null,"abstract":"Anomaly detection (AD) has been widely studied for decades in many real-world\u0000applications, including fraud detection in finance, and intrusion detection for\u0000cybersecurity, etc. Due to the imbalanced nature between protected and\u0000unprotected groups and the imbalanced distributions of normal examples and\u0000anomalies, the learning objectives of most existing anomaly detection methods\u0000tend to solely concentrate on the dominating unprotected group. Thus, it has\u0000been recognized by many researchers about the significance of ensuring model\u0000fairness in anomaly detection. However, the existing fair anomaly detection\u0000methods tend to erroneously label most normal examples from the protected group\u0000as anomalies in the imbalanced scenario where the unprotected group is more\u0000abundant than the protected group. This phenomenon is caused by the improper\u0000design of learning objectives, which statistically focus on learning the\u0000frequent patterns (i.e., the unprotected group) while overlooking the\u0000under-represented patterns (i.e., the protected group). To address these\u0000issues, we propose FairAD, a fairness-aware anomaly detection method targeting\u0000the imbalanced scenario. It consists of a fairness-aware contrastive learning\u0000module and a rebalancing autoencoder module to ensure fairness and handle the\u0000imbalanced data issue, respectively. Moreover, we provide the theoretical\u0000analysis that shows our proposed contrastive learning regularization guarantees\u0000group fairness. Empirical studies demonstrate the effectiveness and efficiency\u0000of FairAD across multiple real-world datasets.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":"45 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Relative Representations: Topological and Geometric Perspectives 相对表示法：拓扑和几何视角

arXiv - CS - Machine Learning

Pub Date : 2024-09-17 DOI: arxiv-2409.10967

Alejandro García-Castellanos, Giovanni Luca Marchetti, Danica Kragic, Martina Scolamiero

Relative representations are an established approach to zero-shot modelstitching, consisting of a non-trainable transformation of the latent space ofa deep neural network. Based on insights of topological and geometric nature,we propose two improvements to relative representations. First, we introduce anormalization procedure in the relative transformation, resulting in invarianceto non-isotropic rescalings and permutations. The latter coincides with thesymmetries in parameter space induced by common activation functions. Second,we propose to deploy topological densification when fine-tuning relativerepresentations, a topological regularization loss encouraging clusteringwithin classes. We provide an empirical investigation on a natural languagetask, where both the proposed variations yield improved performance onzero-shot model stitching.

相对表征是一种成熟的零镜头模型缝合方法，由深度神经网络潜空间的不可训练变换组成。基于拓扑和几何性质的见解，我们对相对表示法提出了两点改进。首先，我们在相对变换中引入了规范化程序，从而实现了对非各向异性重定标和排列的不变性。后者与共同激活函数引起的参数空间对称性相吻合。其次，我们建议在微调相对表示时采用拓扑致密化，这是一种拓扑正则化损失，鼓励在类内进行聚类。我们在一个自然语言任务中进行了实证研究，结果表明，这两种变体都提高了零拍模型拼接的性能。

引用次数: 0

Implicit Reasoning in Deep Time Series Forecasting 深度时间序列预测中的隐含推理

arXiv - CS - Machine Learning

Pub Date : 2024-09-17 DOI: arxiv-2409.10840

Willa Potosnak, Cristian Challu, Mononito Goswami, Michał Wiliński, Nina Żukowska

Recently, time series foundation models have shown promising zero-shotforecasting performance on time series from a wide range of domains. However,it remains unclear whether their success stems from a true understanding oftemporal dynamics or simply from memorizing the training data. While implicitreasoning in language models has been studied, similar evaluations for timeseries models have been largely unexplored. This work takes an initial steptoward assessing the reasoning abilities of deep time series forecastingmodels. We find that certain linear, MLP-based, and patch-based Transformermodels generalize effectively in systematically orchestratedout-of-distribution scenarios, suggesting underexplored reasoning capabilitiesbeyond simple pattern memorization.

最近，时间序列基础模型在广泛领域的时间序列上显示出了良好的零点预测性能。然而，这些模型的成功是源于对时间动态的真正理解，还是仅仅源于对训练数据的记忆，目前仍不清楚。虽然语言模型中的隐式推理已被研究过，但时间序列模型的类似评估在很大程度上还未被探索过。这项研究迈出了评估深度时间序列预测模型推理能力的第一步。我们发现，某些线性模型、基于 MLP 的模型和基于补丁的 Transformerm 模型能在系统协调的分布外场景中有效泛化，这表明除了简单的模式记忆外，推理能力还未得到充分开发。

引用次数: 0

Federated Learning with Integrated Sensing, Communication, and Computation: Frameworks and Performance Analysis 集成传感、通信和计算的联合学习：框架和性能分析

arXiv - CS - Machine Learning

Pub Date : 2024-09-17 DOI: arxiv-2409.11240

Yipeng Liang, Qimei Chen, Hao Jiang

With the emergence of integrated sensing, communication, and computation(ISCC) in the upcoming 6G era, federated learning with ISCC (FL-ISCC),integrating sample collection, local training, and parameter exchange andaggregation, has garnered increasing interest for enhancing trainingefficiency. Currently, FL-ISCC primarily includes two algorithms: FedAVG-ISCCand FedSGD-ISCC. However, the theoretical understanding of the performance andadvantages of these algorithms remains limited. To address this gap, weinvestigate a general FL-ISCC framework, implementing both FedAVG-ISCC andFedSGD-ISCC. We experimentally demonstrate the substantial potential of theISCC framework in reducing latency and energy consumption in FL. Furthermore,we provide a theoretical analysis and comparison. The results reveal that:1)Both sample collection and communication errors negatively impact algorithmperformance, highlighting the need for careful design to optimize FL-ISCCapplications. 2) FedAVG-ISCC performs better than FedSGD-ISCC under IID datadue to its advantage with multiple local updates. 3) FedSGD-ISCC is more robustthan FedAVG-ISCC under non-IID data, where the multiple local updates inFedAVG-ISCC worsen performance as non-IID data increases. FedSGD-ISCC maintainsperformance levels similar to IID conditions. 4) FedSGD-ISCC is more resilientto communication errors than FedAVG-ISCC, which suffers from significantperformance degradation as communication errors increase.Extensive simulationsconfirm the effectiveness of the FL-ISCC framework and validate our theoreticalanalysis.

在即将到来的 6G 时代，随着集成传感、通信和计算（ISCC）技术的出现，集样本采集、本地训练、参数交换和聚合于一体的 ISCC 联合学习（FL-ISCC）在提高训练效率方面受到越来越多的关注。目前，FL-ISCC 主要包括两种算法：FedAVG-ISCC和FedSGD-ISCC。然而，人们对这些算法的性能和优势的理论认识仍然有限。为了弥补这一不足，我们研究了一个通用的 FL-ISCC 框架，同时实现了 FedAVG-ISCC 和 FedSGD-ISCC。我们通过实验证明了 ISCC 框架在减少 FL 延迟和能耗方面的巨大潜力。此外，我们还进行了理论分析和比较。结果表明：1）样本收集和通信错误都会对算法性能产生负面影响，因此需要精心设计以优化 FL-ISCC 应用。2）FedAVG-ISCC 在 IID 数据下的性能优于 FedSGD-ISCC，这是因为它具有多次局部更新的优势。3) 在非 IID 数据下，FedSGD-ISCC 比 FedAVG-ISCC 更稳健，因为随着非 IID 数据的增加，FedAVG-ISCC 的多次本地更新会使性能下降。FedSGD-ISCC 可保持与 IID 条件类似的性能水平。4) FedSGD-ISCC 比 FedAVG-ISCC 更能抵御通信错误，后者的性能会随着通信错误的增加而显著下降。

{"title":"Federated Learning with Integrated Sensing, Communication, and Computation: Frameworks and Performance Analysis","authors":"Yipeng Liang, Qimei Chen, Hao Jiang","doi":"arxiv-2409.11240","DOIUrl":"https://doi.org/arxiv-2409.11240","url":null,"abstract":"With the emergence of integrated sensing, communication, and computation\u0000(ISCC) in the upcoming 6G era, federated learning with ISCC (FL-ISCC),\u0000integrating sample collection, local training, and parameter exchange and\u0000aggregation, has garnered increasing interest for enhancing training\u0000efficiency. Currently, FL-ISCC primarily includes two algorithms: FedAVG-ISCC\u0000and FedSGD-ISCC. However, the theoretical understanding of the performance and\u0000advantages of these algorithms remains limited. To address this gap, we\u0000investigate a general FL-ISCC framework, implementing both FedAVG-ISCC and\u0000FedSGD-ISCC. We experimentally demonstrate the substantial potential of the\u0000ISCC framework in reducing latency and energy consumption in FL. Furthermore,\u0000we provide a theoretical analysis and comparison. The results reveal that:1)\u0000Both sample collection and communication errors negatively impact algorithm\u0000performance, highlighting the need for careful design to optimize FL-ISCC\u0000applications. 2) FedAVG-ISCC performs better than FedSGD-ISCC under IID data\u0000due to its advantage with multiple local updates. 3) FedSGD-ISCC is more robust\u0000than FedAVG-ISCC under non-IID data, where the multiple local updates in\u0000FedAVG-ISCC worsen performance as non-IID data increases. FedSGD-ISCC maintains\u0000performance levels similar to IID conditions. 4) FedSGD-ISCC is more resilient\u0000to communication errors than FedAVG-ISCC, which suffers from significant\u0000performance degradation as communication errors increase.Extensive simulations\u0000confirm the effectiveness of the FL-ISCC framework and validate our theoretical\u0000analysis.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":"88 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261968","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

SOAP: Improving and Stabilizing Shampoo using Adam SOAP：使用亚当改进和稳定洗发水

arXiv - CS - Machine Learning

Pub Date : 2024-09-17 DOI: arxiv-2409.11321

Nikhil Vyas, Depen Morwani, Rosie Zhao, Itai Shapira, David Brandfonbrener, Lucas Janson, Sham Kakade

There is growing evidence of the effectiveness of Shampoo, a higher-orderpreconditioning method, over Adam in deep learning optimization tasks. However,Shampoo's drawbacks include additional hyperparameters and computationaloverhead when compared to Adam, which only updates running averages of first-and second-moment quantities. This work establishes a formal connection betweenShampoo (implemented with the 1/2 power) and Adafactor -- a memory-efficientapproximation of Adam -- showing that Shampoo is equivalent to runningAdafactor in the eigenbasis of Shampoo's preconditioner. This insight leads tothe design of a simpler and computationally efficient algorithm:$textbf{S}$hampo$textbf{O}$ with $textbf{A}$dam in the$textbf{P}$reconditioner's eigenbasis (SOAP). With regards to improving Shampoo's computational efficiency, the moststraightforward approach would be to simply compute Shampoo'seigendecomposition less frequently. Unfortunately, as our empirical resultsshow, this leads to performance degradation that worsens with this frequency.SOAP mitigates this degradation by continually updating the running average ofthe second moment, just as Adam does, but in the current (slowly changing)coordinate basis. Furthermore, since SOAP is equivalent to running Adam in arotated space, it introduces only one additional hyperparameter (thepreconditioning frequency) compared to Adam. We empirically evaluate SOAP onlanguage model pre-training with 360m and 660m sized models. In the large batchregime, SOAP reduces the number of iterations by over 40% and wall clock timeby over 35% compared to AdamW, with approximately 20% improvements in bothmetrics compared to Shampoo. An implementation of SOAP is available athttps://github.com/nikhilvyas/SOAP.

越来越多的证据表明，在深度学习优化任务中，高阶预处理方法香波比亚当更有效。然而，与亚当相比，香波的缺点包括额外的超参数和计算开销，因为亚当只更新一瞬和二瞬量的运行平均值。这项工作建立了香波（用 1/2幂实现）与 Adafactor（Adam 的内存系数近似值）之间的正式联系，表明香波等同于在香波前置条件器的特征基础上运行 Adafactor。这一洞察力促使我们设计出一种更简单、计算效率更高的算法：在$textbf{P}$预条件器的特征基础（SOAP）上，用$textbf{A}$亚当运行$textbf{S}$香波$textbf{O}$。关于提高香波的计算效率，最直接的方法就是减少香波的自分解计算频率。但是，正如我们的实证结果所显示的那样，这样做会导致性能下降，而且随着计算频率的增加，性能下降的情况会越来越严重。SOAP 通过不断更新第二时刻的运行平均值来缓解这种性能下降的情况，就像亚当所做的那样，但它是以当前（缓慢变化的）坐标为基础的。此外，由于 SOAP 等同于在旋转空间中运行 Adam，因此与 Adam 相比，它只引入了一个额外的超参数（预处理频率）。我们用 360m 和 660m 大小的模型对 SOAP 的语言模型预训练进行了实证评估。与 AdamW 相比，SOAP 减少了 40% 以上的迭代次数，减少了 35% 以上的挂钟时间；与 Shampoo 相比，SOAP 在这两项指标上都有约 20% 的改进。SOAP 的实现可在https://github.com/nikhilvyas/SOAP。

{"title":"SOAP: Improving and Stabilizing Shampoo using Adam","authors":"Nikhil Vyas, Depen Morwani, Rosie Zhao, Itai Shapira, David Brandfonbrener, Lucas Janson, Sham Kakade","doi":"arxiv-2409.11321","DOIUrl":"https://doi.org/arxiv-2409.11321","url":null,"abstract":"There is growing evidence of the effectiveness of Shampoo, a higher-order\u0000preconditioning method, over Adam in deep learning optimization tasks. However,\u0000Shampoo's drawbacks include additional hyperparameters and computational\u0000overhead when compared to Adam, which only updates running averages of first-\u0000and second-moment quantities. This work establishes a formal connection between\u0000Shampoo (implemented with the 1/2 power) and Adafactor -- a memory-efficient\u0000approximation of Adam -- showing that Shampoo is equivalent to running\u0000Adafactor in the eigenbasis of Shampoo's preconditioner. This insight leads to\u0000the design of a simpler and computationally efficient algorithm:\u0000$textbf{S}$hampo$textbf{O}$ with $textbf{A}$dam in the\u0000$textbf{P}$reconditioner's eigenbasis (SOAP). With regards to improving Shampoo's computational efficiency, the most\u0000straightforward approach would be to simply compute Shampoo's\u0000eigendecomposition less frequently. Unfortunately, as our empirical results\u0000show, this leads to performance degradation that worsens with this frequency.\u0000SOAP mitigates this degradation by continually updating the running average of\u0000the second moment, just as Adam does, but in the current (slowly changing)\u0000coordinate basis. Furthermore, since SOAP is equivalent to running Adam in a\u0000rotated space, it introduces only one additional hyperparameter (the\u0000preconditioning frequency) compared to Adam. We empirically evaluate SOAP on\u0000language model pre-training with 360m and 660m sized models. In the large batch\u0000regime, SOAP reduces the number of iterations by over 40% and wall clock time\u0000by over 35% compared to AdamW, with approximately 20% improvements in both\u0000metrics compared to Shampoo. An implementation of SOAP is available at\u0000https://github.com/nikhilvyas/SOAP.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":"205 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261964","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Improve Machine Learning carbon footprint using Parquet dataset format and Mixed Precision training for regression algorithms 使用 Parquet 数据集格式和混合精度训练回归算法，改善机器学习的碳足迹

arXiv - CS - Machine Learning

Pub Date : 2024-09-17 DOI: arxiv-2409.11071

Andrew Antonopoulos

This study was the 2nd part of my dissertation for my master degree andcompared the power consumption using the Comma-Separated-Values (CSV) andparquet dataset format with the default floating point (32bit) and Nvidia mixedprecision (16bit and 32bit) while training a regression ML model. The samecustom PC as per the 1st part, which was dedicated to the classificationtesting and analysis, was built to perform the experiments, and different MLhyper-parameters, such as batch size, neurons, and epochs, were chosen to buildDeep Neural Networks (DNN). A benchmarking test with default hyper-parametervalues for the DNN was used as a reference, while the experiments used acombination of different settings. The results were recorded in Excel, anddescriptive statistics were chosen to calculate the mean between the groups andcompare them using graphs and tables. The outcome was positive when using mixedprecision combined with specific hyper-parameters. Compared to thebenchmarking, optimising the regression models reduced the power consumptionbetween 7 and 11 Watts. The regression results show that while mixed precisioncan help improve power consumption, we must carefully consider thehyper-parameters. A high number of batch sizes and neurons will negativelyaffect power consumption. However, this research required inferentialstatistics, specifically ANOVA and T-test, to compare the relationship betweenthe means. The results reported no statistical significance between the meansin the regression tests and accepted H0. Therefore, choosing different MLtechniques and the Parquet dataset format will not improve the computationalpower consumption and the overall ML carbon footprint. However, a moreextensive implementation with a cluster of GPUs can increase the sample sizesignificantly, as it is an essential factor and can change the outcome of thestatistical analysis.

本研究是我硕士论文的第二部分，比较了使用逗号分隔值（CSV）和parquet数据集格式、默认浮点（32位）和Nvidia混合精度（16位和32位）训练回归ML模型时的功耗。为了进行实验，我们构建了与第一部分相同的定制 PC，专门用于分类测试和分析，并选择了不同的 ML 超参数（如批量大小、神经元和历时）来构建深度神经网络（DNN）。使用 DNN 的默认超参数值进行基准测试作为参考，而实验则使用不同设置的组合。实验结果记录在 Excel 中，并选择描述性统计来计算各组之间的平均值，并使用图形和表格对它们进行比较。在使用混合精度和特定超参数时，结果是积极的。与基准测试相比，优化回归模型降低了 7 到 11 瓦的功耗。回归结果表明，虽然混合精度有助于改善功耗，但我们必须仔细考虑超参数。批量大小和神经元数量过多会对功耗产生负面影响。不过，这项研究需要使用推断统计学，特别是方差分析和 T 检验，来比较平均值之间的关系。结果表明，回归测试中各均值之间没有统计学意义，接受 H0。因此，选择不同的 ML 技术和 Parquet 数据集格式不会改善计算能力消耗和整体 ML 碳足迹。然而，使用 GPU 集群进行更广泛的实施可以显著增加样本量，因为样本量是一个重要因素，可以改变统计分析的结果。

{"title":"Improve Machine Learning carbon footprint using Parquet dataset format and Mixed Precision training for regression algorithms","authors":"Andrew Antonopoulos","doi":"arxiv-2409.11071","DOIUrl":"https://doi.org/arxiv-2409.11071","url":null,"abstract":"This study was the 2nd part of my dissertation for my master degree and\u0000compared the power consumption using the Comma-Separated-Values (CSV) and\u0000parquet dataset format with the default floating point (32bit) and Nvidia mixed\u0000precision (16bit and 32bit) while training a regression ML model. The same\u0000custom PC as per the 1st part, which was dedicated to the classification\u0000testing and analysis, was built to perform the experiments, and different ML\u0000hyper-parameters, such as batch size, neurons, and epochs, were chosen to build\u0000Deep Neural Networks (DNN). A benchmarking test with default hyper-parameter\u0000values for the DNN was used as a reference, while the experiments used a\u0000combination of different settings. The results were recorded in Excel, and\u0000descriptive statistics were chosen to calculate the mean between the groups and\u0000compare them using graphs and tables. The outcome was positive when using mixed\u0000precision combined with specific hyper-parameters. Compared to the\u0000benchmarking, optimising the regression models reduced the power consumption\u0000between 7 and 11 Watts. The regression results show that while mixed precision\u0000can help improve power consumption, we must carefully consider the\u0000hyper-parameters. A high number of batch sizes and neurons will negatively\u0000affect power consumption. However, this research required inferential\u0000statistics, specifically ANOVA and T-test, to compare the relationship between\u0000the means. The results reported no statistical significance between the means\u0000in the regression tests and accepted H0. Therefore, choosing different ML\u0000techniques and the Parquet dataset format will not improve the computational\u0000power consumption and the overall ML carbon footprint. However, a more\u0000extensive implementation with a cluster of GPUs can increase the sample size\u0000significantly, as it is an essential factor and can change the outcome of the\u0000statistical analysis.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":"30 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261969","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

GINTRIP: Interpretable Temporal Graph Regression using Information bottleneck and Prototype-based method GINTRIP：利用信息瓶颈和基于原型的方法实现可解释的时序图回归

arXiv - CS - Machine Learning

Pub Date : 2024-09-17 DOI: arxiv-2409.10996

Ali Royat, Seyed Mohamad Moghadas, Lesley De Cruz, Adrian Munteanu

Deep neural networks (DNNs) have demonstrated remarkable performance acrossvarious domains, yet their application to temporal graph regression tasks facessignificant challenges regarding interpretability. This critical issue, rootedin the inherent complexity of both DNNs and underlying spatio-temporal patternsin the graph, calls for innovative solutions. While interpretability concernsin Graph Neural Networks (GNNs) mirror those of DNNs, to the best of ourknowledge, no notable work has addressed the interpretability of temporal GNNsusing a combination of Information Bottleneck (IB) principles andprototype-based methods. Our research introduces a novel approach that uniquelyintegrates these techniques to enhance the interpretability of temporal graphregression models. The key contributions of our work are threefold: Weintroduce the underline{G}raph underline{IN}terpretability inunderline{T}emporal underline{R}egression task using underline{I}nformationbottleneck and underline{P}rototype (GINTRIP) framework, the first combinedapplication of IB and prototype-based methods for interpretable temporal graphtasks. We derive a novel theoretical bound on mutual information (MI),extending the applicability of IB principles to graph regression tasks. Weincorporate an unsupervised auxiliary classification head, fostering multi-tasklearning and diverse concept representation, which enhances the modelbottleneck's interpretability. Our model is evaluated on real-world trafficdatasets, outperforming existing methods in both forecasting accuracy andinterpretability-related metrics.

深度神经网络（DNN）在各个领域都表现出了卓越的性能，但将其应用于时序图回归任务却面临着可解释性方面的重大挑战。这一关键问题的根源在于 DNN 和图中潜在时空模式的内在复杂性，因此需要创新的解决方案。虽然图神经网络（GNN）中的可解释性问题与 DNNs 的问题如出一辙，但据我们所知，还没有哪项著名研究结合信息瓶颈（IB）原理和基于原型的方法来解决时态 GNNs 的可解释性问题。我们的研究引入了一种新方法，独特地整合了这些技术，以增强时态图回归模型的可解释性。我们工作的主要贡献有三个方面：我们使用信息瓶颈和原型（GINTRIP）框架引入了时态图回归任务中的（underline{G}raph）（underline{T}emporal）（underline{R}egression）可解释性，这是首次将基于信息瓶颈和原型的方法结合应用于可解释的时态图任务。我们推导出了互信息（MI）的新理论约束，将 IB 原则的适用性扩展到了图回归任务。我们加入了无监督辅助分类头，促进了多任务学习和多样化概念表示，从而增强了模型瓶颈的可解释性。我们的模型在实际交通数据集上进行了评估，在预测准确性和可解释性相关指标上都优于现有方法。

{"title":"GINTRIP: Interpretable Temporal Graph Regression using Information bottleneck and Prototype-based method","authors":"Ali Royat, Seyed Mohamad Moghadas, Lesley De Cruz, Adrian Munteanu","doi":"arxiv-2409.10996","DOIUrl":"https://doi.org/arxiv-2409.10996","url":null,"abstract":"Deep neural networks (DNNs) have demonstrated remarkable performance across\u0000various domains, yet their application to temporal graph regression tasks faces\u0000significant challenges regarding interpretability. This critical issue, rooted\u0000in the inherent complexity of both DNNs and underlying spatio-temporal patterns\u0000in the graph, calls for innovative solutions. While interpretability concerns\u0000in Graph Neural Networks (GNNs) mirror those of DNNs, to the best of our\u0000knowledge, no notable work has addressed the interpretability of temporal GNNs\u0000using a combination of Information Bottleneck (IB) principles and\u0000prototype-based methods. Our research introduces a novel approach that uniquely\u0000integrates these techniques to enhance the interpretability of temporal graph\u0000regression models. The key contributions of our work are threefold: We\u0000introduce the underline{G}raph underline{IN}terpretability in\u0000underline{T}emporal underline{R}egression task using underline{I}nformation\u0000bottleneck and underline{P}rototype (GINTRIP) framework, the first combined\u0000application of IB and prototype-based methods for interpretable temporal graph\u0000tasks. We derive a novel theoretical bound on mutual information (MI),\u0000extending the applicability of IB principles to graph regression tasks. We\u0000incorporate an unsupervised auxiliary classification head, fostering multi-task\u0000learning and diverse concept representation, which enhances the model\u0000bottleneck's interpretability. Our model is evaluated on real-world traffic\u0000datasets, outperforming existing methods in both forecasting accuracy and\u0000interpretability-related metrics.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":"8 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261893","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0