In recent years, urban safety has become a paramount concern for city planners and law enforcement agencies. Accurate prediction of likely crime occurrences can significantly enhance preventive measures and resource allocation. However, many law enforcement departments lack the tools to analyze and apply advanced AI and ML techniques that can support city planners, watch programs, and safety leaders to take proactive steps towards overall community safety. This paper explores the effectiveness of ML techniques to predict spatial and temporal patterns of crimes in urban areas. Leveraging police dispatch call data from San Jose, CA, the research goal is to achieve a high degree of accuracy in categorizing calls into priority levels particularly for more dangerous situations that require an immediate law enforcement response. This categorization is informed by the time, place, and nature of the call. The research steps include data extraction, preprocessing, feature engineering, exploratory data analysis, implementation, optimization and tuning of different supervised machine learning models and neural networks. The accuracy and precision are examined for different models and features at varying granularity of crime categories and location precision. The results demonstrate that when compared to a variety of other models, Random Forest classification models are most effective in identifying dangerous situations and their corresponding priority levels with high accuracy (Accuracy = 85%, AUC = 0.92) at a local level while ensuring a minimum amount of false negatives. While further research and data gathering is needed to include other social and economic factors, these results provide valuable insights for law enforcement agencies to optimize resources, develop proactive deployment approaches, and adjust response patterns to enhance overall public safety outcomes in an unbiased way.
近年来,城市安全已成为城市规划者和执法机构最为关注的问题。对可能发生的犯罪进行准确预测可以大大加强预防措施和资源分配。然而,许多执法部门缺乏分析和应用先进人工智能和 ML 技术的工具,而这些技术可以支持城市规划者、监视计划和安全领导者采取积极措施,以实现整体社区安全。本文探讨了 ML 技术在预测城市地区犯罪的空间和时间模式方面的有效性。利用来自加利福尼亚州圣何塞的警方调度呼叫数据,研究目标是实现高度准确的呼叫优先级分类,尤其是针对需要立即执法响应的更危险情况。这种分类是根据呼叫的时间、地点和性质进行的。研究步骤包括数据提取、预处理、特征工程、探索性数据分析、不同监督机器学习模型和神经网络的实施、优化和调整。在不同的犯罪类别粒度和位置精度下,对不同模型和特征的准确性和精确度进行了检验。结果表明,与其他各种模型相比,随机森林分类模型在识别危险情况及其相应的优先级别方面最为有效,而且准确率较高(准确率= 85%,AUC = 0.92),同时确保将虚假负值降到最低。虽然还需要进一步的研究和数据收集,以纳入其他社会和经济因素,但这些结果为执法机构优化资源、制定前瞻性部署方法和调整响应模式提供了宝贵的见解,从而以公正的方式提高整体公共安全成果。
{"title":"Machine Learning for Public Good: Predicting Urban Crime Patterns to Enhance Community Safety","authors":"Sia Gupta, Simeon Sayer","doi":"arxiv-2409.10838","DOIUrl":"https://doi.org/arxiv-2409.10838","url":null,"abstract":"In recent years, urban safety has become a paramount concern for city\u0000planners and law enforcement agencies. Accurate prediction of likely crime\u0000occurrences can significantly enhance preventive measures and resource\u0000allocation. However, many law enforcement departments lack the tools to analyze\u0000and apply advanced AI and ML techniques that can support city planners, watch\u0000programs, and safety leaders to take proactive steps towards overall community\u0000safety. This paper explores the effectiveness of ML techniques to predict spatial and\u0000temporal patterns of crimes in urban areas. Leveraging police dispatch call\u0000data from San Jose, CA, the research goal is to achieve a high degree of\u0000accuracy in categorizing calls into priority levels particularly for more\u0000dangerous situations that require an immediate law enforcement response. This\u0000categorization is informed by the time, place, and nature of the call. The\u0000research steps include data extraction, preprocessing, feature engineering,\u0000exploratory data analysis, implementation, optimization and tuning of different\u0000supervised machine learning models and neural networks. The accuracy and\u0000precision are examined for different models and features at varying granularity\u0000of crime categories and location precision. The results demonstrate that when compared to a variety of other models,\u0000Random Forest classification models are most effective in identifying dangerous\u0000situations and their corresponding priority levels with high accuracy (Accuracy\u0000= 85%, AUC = 0.92) at a local level while ensuring a minimum amount of false\u0000negatives. While further research and data gathering is needed to include other\u0000social and economic factors, these results provide valuable insights for law\u0000enforcement agencies to optimize resources, develop proactive deployment\u0000approaches, and adjust response patterns to enhance overall public safety\u0000outcomes in an unbiased way.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261898","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Winnie Chow, Lauren Gardiner, Haraldur T. Hallgrímsson, Maxwell A. Xu, Shirley You Ren
Multi-modal large language models (MLLMs) have enabled numerous advances in understanding and reasoning in domains like vision, but we have not yet seen this broad success for time-series. Although prior works on time-series MLLMs have shown promising performance in time-series forecasting, very few works show how an LLM could be used for time-series reasoning in natural language. We propose a novel multi-modal time-series LLM approach that learns generalizable information across various domains with powerful zero-shot performance. First, we train a lightweight time-series encoder on top of an LLM to directly extract time-series information. Then, we fine-tune our model with chain-of-thought augmented time-series tasks to encourage the model to generate reasoning paths. We show that our model learns a latent representation that reflects specific time-series features (e.g. slope, frequency), as well as outperforming GPT-4o on a set of zero-shot reasoning tasks on a variety of domains.
{"title":"Towards Time Series Reasoning with LLMs","authors":"Winnie Chow, Lauren Gardiner, Haraldur T. Hallgrímsson, Maxwell A. Xu, Shirley You Ren","doi":"arxiv-2409.11376","DOIUrl":"https://doi.org/arxiv-2409.11376","url":null,"abstract":"Multi-modal large language models (MLLMs) have enabled numerous advances in\u0000understanding and reasoning in domains like vision, but we have not yet seen\u0000this broad success for time-series. Although prior works on time-series MLLMs\u0000have shown promising performance in time-series forecasting, very few works\u0000show how an LLM could be used for time-series reasoning in natural language. We\u0000propose a novel multi-modal time-series LLM approach that learns generalizable\u0000information across various domains with powerful zero-shot performance. First,\u0000we train a lightweight time-series encoder on top of an LLM to directly extract\u0000time-series information. Then, we fine-tune our model with chain-of-thought\u0000augmented time-series tasks to encourage the model to generate reasoning paths.\u0000We show that our model learns a latent representation that reflects specific\u0000time-series features (e.g. slope, frequency), as well as outperforming GPT-4o\u0000on a set of zero-shot reasoning tasks on a variety of domains.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261963","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Symbolic encoding has been used in multi-operator learning as a way to embed additional information for distinct time-series data. For spatiotemporal systems described by time-dependent partial differential equations, the equation itself provides an additional modality to identify the system. The utilization of symbolic expressions along side time-series samples allows for the development of multimodal predictive neural networks. A key challenge with current approaches is that the symbolic information, i.e. the equations, must be manually preprocessed (simplified, rearranged, etc.) to match and relate to the existing token library, which increases costs and reduces flexibility, especially when dealing with new differential equations. We propose a new token library based on SymPy to encode differential equations as an additional modality for time-series models. The proposed approach incurs minimal cost, is automated, and maintains high prediction accuracy for forecasting tasks. Additionally, we include a Bayesian filtering module that connects the different modalities to refine the learned equation. This improves the accuracy of the learned symbolic representation and the predicted time-series.
{"title":"Time-Series Forecasting, Knowledge Distillation, and Refinement within a Multimodal PDE Foundation Model","authors":"Derek Jollie, Jingmin Sun, Zecheng Zhang, Hayden Schaeffer","doi":"arxiv-2409.11609","DOIUrl":"https://doi.org/arxiv-2409.11609","url":null,"abstract":"Symbolic encoding has been used in multi-operator learning as a way to embed\u0000additional information for distinct time-series data. For spatiotemporal\u0000systems described by time-dependent partial differential equations, the\u0000equation itself provides an additional modality to identify the system. The\u0000utilization of symbolic expressions along side time-series samples allows for\u0000the development of multimodal predictive neural networks. A key challenge with\u0000current approaches is that the symbolic information, i.e. the equations, must\u0000be manually preprocessed (simplified, rearranged, etc.) to match and relate to\u0000the existing token library, which increases costs and reduces flexibility,\u0000especially when dealing with new differential equations. We propose a new token\u0000library based on SymPy to encode differential equations as an additional\u0000modality for time-series models. The proposed approach incurs minimal cost, is\u0000automated, and maintains high prediction accuracy for forecasting tasks.\u0000Additionally, we include a Bayesian filtering module that connects the\u0000different modalities to refine the learned equation. This improves the accuracy\u0000of the learned symbolic representation and the predicted time-series.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261958","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ziwei Wu, Lecheng Zheng, Yuancheng Yu, Ruizhong Qiu, John Birge, Jingrui He
Anomaly detection (AD) has been widely studied for decades in many real-world applications, including fraud detection in finance, and intrusion detection for cybersecurity, etc. Due to the imbalanced nature between protected and unprotected groups and the imbalanced distributions of normal examples and anomalies, the learning objectives of most existing anomaly detection methods tend to solely concentrate on the dominating unprotected group. Thus, it has been recognized by many researchers about the significance of ensuring model fairness in anomaly detection. However, the existing fair anomaly detection methods tend to erroneously label most normal examples from the protected group as anomalies in the imbalanced scenario where the unprotected group is more abundant than the protected group. This phenomenon is caused by the improper design of learning objectives, which statistically focus on learning the frequent patterns (i.e., the unprotected group) while overlooking the under-represented patterns (i.e., the protected group). To address these issues, we propose FairAD, a fairness-aware anomaly detection method targeting the imbalanced scenario. It consists of a fairness-aware contrastive learning module and a rebalancing autoencoder module to ensure fairness and handle the imbalanced data issue, respectively. Moreover, we provide the theoretical analysis that shows our proposed contrastive learning regularization guarantees group fairness. Empirical studies demonstrate the effectiveness and efficiency of FairAD across multiple real-world datasets.
{"title":"Fair Anomaly Detection For Imbalanced Groups","authors":"Ziwei Wu, Lecheng Zheng, Yuancheng Yu, Ruizhong Qiu, John Birge, Jingrui He","doi":"arxiv-2409.10951","DOIUrl":"https://doi.org/arxiv-2409.10951","url":null,"abstract":"Anomaly detection (AD) has been widely studied for decades in many real-world\u0000applications, including fraud detection in finance, and intrusion detection for\u0000cybersecurity, etc. Due to the imbalanced nature between protected and\u0000unprotected groups and the imbalanced distributions of normal examples and\u0000anomalies, the learning objectives of most existing anomaly detection methods\u0000tend to solely concentrate on the dominating unprotected group. Thus, it has\u0000been recognized by many researchers about the significance of ensuring model\u0000fairness in anomaly detection. However, the existing fair anomaly detection\u0000methods tend to erroneously label most normal examples from the protected group\u0000as anomalies in the imbalanced scenario where the unprotected group is more\u0000abundant than the protected group. This phenomenon is caused by the improper\u0000design of learning objectives, which statistically focus on learning the\u0000frequent patterns (i.e., the unprotected group) while overlooking the\u0000under-represented patterns (i.e., the protected group). To address these\u0000issues, we propose FairAD, a fairness-aware anomaly detection method targeting\u0000the imbalanced scenario. It consists of a fairness-aware contrastive learning\u0000module and a rebalancing autoencoder module to ensure fairness and handle the\u0000imbalanced data issue, respectively. Moreover, we provide the theoretical\u0000analysis that shows our proposed contrastive learning regularization guarantees\u0000group fairness. Empirical studies demonstrate the effectiveness and efficiency\u0000of FairAD across multiple real-world datasets.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alejandro García-Castellanos, Giovanni Luca Marchetti, Danica Kragic, Martina Scolamiero
Relative representations are an established approach to zero-shot model stitching, consisting of a non-trainable transformation of the latent space of a deep neural network. Based on insights of topological and geometric nature, we propose two improvements to relative representations. First, we introduce a normalization procedure in the relative transformation, resulting in invariance to non-isotropic rescalings and permutations. The latter coincides with the symmetries in parameter space induced by common activation functions. Second, we propose to deploy topological densification when fine-tuning relative representations, a topological regularization loss encouraging clustering within classes. We provide an empirical investigation on a natural language task, where both the proposed variations yield improved performance on zero-shot model stitching.
{"title":"Relative Representations: Topological and Geometric Perspectives","authors":"Alejandro García-Castellanos, Giovanni Luca Marchetti, Danica Kragic, Martina Scolamiero","doi":"arxiv-2409.10967","DOIUrl":"https://doi.org/arxiv-2409.10967","url":null,"abstract":"Relative representations are an established approach to zero-shot model\u0000stitching, consisting of a non-trainable transformation of the latent space of\u0000a deep neural network. Based on insights of topological and geometric nature,\u0000we propose two improvements to relative representations. First, we introduce a\u0000normalization procedure in the relative transformation, resulting in invariance\u0000to non-isotropic rescalings and permutations. The latter coincides with the\u0000symmetries in parameter space induced by common activation functions. Second,\u0000we propose to deploy topological densification when fine-tuning relative\u0000representations, a topological regularization loss encouraging clustering\u0000within classes. We provide an empirical investigation on a natural language\u0000task, where both the proposed variations yield improved performance on\u0000zero-shot model stitching.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261890","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Willa Potosnak, Cristian Challu, Mononito Goswami, Michał Wiliński, Nina Żukowska
Recently, time series foundation models have shown promising zero-shot forecasting performance on time series from a wide range of domains. However, it remains unclear whether their success stems from a true understanding of temporal dynamics or simply from memorizing the training data. While implicit reasoning in language models has been studied, similar evaluations for time series models have been largely unexplored. This work takes an initial step toward assessing the reasoning abilities of deep time series forecasting models. We find that certain linear, MLP-based, and patch-based Transformer models generalize effectively in systematically orchestrated out-of-distribution scenarios, suggesting underexplored reasoning capabilities beyond simple pattern memorization.
{"title":"Implicit Reasoning in Deep Time Series Forecasting","authors":"Willa Potosnak, Cristian Challu, Mononito Goswami, Michał Wiliński, Nina Żukowska","doi":"arxiv-2409.10840","DOIUrl":"https://doi.org/arxiv-2409.10840","url":null,"abstract":"Recently, time series foundation models have shown promising zero-shot\u0000forecasting performance on time series from a wide range of domains. However,\u0000it remains unclear whether their success stems from a true understanding of\u0000temporal dynamics or simply from memorizing the training data. While implicit\u0000reasoning in language models has been studied, similar evaluations for time\u0000series models have been largely unexplored. This work takes an initial step\u0000toward assessing the reasoning abilities of deep time series forecasting\u0000models. We find that certain linear, MLP-based, and patch-based Transformer\u0000models generalize effectively in systematically orchestrated\u0000out-of-distribution scenarios, suggesting underexplored reasoning capabilities\u0000beyond simple pattern memorization.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261895","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
With the emergence of integrated sensing, communication, and computation (ISCC) in the upcoming 6G era, federated learning with ISCC (FL-ISCC), integrating sample collection, local training, and parameter exchange and aggregation, has garnered increasing interest for enhancing training efficiency. Currently, FL-ISCC primarily includes two algorithms: FedAVG-ISCC and FedSGD-ISCC. However, the theoretical understanding of the performance and advantages of these algorithms remains limited. To address this gap, we investigate a general FL-ISCC framework, implementing both FedAVG-ISCC and FedSGD-ISCC. We experimentally demonstrate the substantial potential of the ISCC framework in reducing latency and energy consumption in FL. Furthermore, we provide a theoretical analysis and comparison. The results reveal that:1) Both sample collection and communication errors negatively impact algorithm performance, highlighting the need for careful design to optimize FL-ISCC applications. 2) FedAVG-ISCC performs better than FedSGD-ISCC under IID data due to its advantage with multiple local updates. 3) FedSGD-ISCC is more robust than FedAVG-ISCC under non-IID data, where the multiple local updates in FedAVG-ISCC worsen performance as non-IID data increases. FedSGD-ISCC maintains performance levels similar to IID conditions. 4) FedSGD-ISCC is more resilient to communication errors than FedAVG-ISCC, which suffers from significant performance degradation as communication errors increase.Extensive simulations confirm the effectiveness of the FL-ISCC framework and validate our theoretical analysis.
{"title":"Federated Learning with Integrated Sensing, Communication, and Computation: Frameworks and Performance Analysis","authors":"Yipeng Liang, Qimei Chen, Hao Jiang","doi":"arxiv-2409.11240","DOIUrl":"https://doi.org/arxiv-2409.11240","url":null,"abstract":"With the emergence of integrated sensing, communication, and computation\u0000(ISCC) in the upcoming 6G era, federated learning with ISCC (FL-ISCC),\u0000integrating sample collection, local training, and parameter exchange and\u0000aggregation, has garnered increasing interest for enhancing training\u0000efficiency. Currently, FL-ISCC primarily includes two algorithms: FedAVG-ISCC\u0000and FedSGD-ISCC. However, the theoretical understanding of the performance and\u0000advantages of these algorithms remains limited. To address this gap, we\u0000investigate a general FL-ISCC framework, implementing both FedAVG-ISCC and\u0000FedSGD-ISCC. We experimentally demonstrate the substantial potential of the\u0000ISCC framework in reducing latency and energy consumption in FL. Furthermore,\u0000we provide a theoretical analysis and comparison. The results reveal that:1)\u0000Both sample collection and communication errors negatively impact algorithm\u0000performance, highlighting the need for careful design to optimize FL-ISCC\u0000applications. 2) FedAVG-ISCC performs better than FedSGD-ISCC under IID data\u0000due to its advantage with multiple local updates. 3) FedSGD-ISCC is more robust\u0000than FedAVG-ISCC under non-IID data, where the multiple local updates in\u0000FedAVG-ISCC worsen performance as non-IID data increases. FedSGD-ISCC maintains\u0000performance levels similar to IID conditions. 4) FedSGD-ISCC is more resilient\u0000to communication errors than FedAVG-ISCC, which suffers from significant\u0000performance degradation as communication errors increase.Extensive simulations\u0000confirm the effectiveness of the FL-ISCC framework and validate our theoretical\u0000analysis.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261968","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This study was the 2nd part of my dissertation for my master degree and compared the power consumption using the Comma-Separated-Values (CSV) and parquet dataset format with the default floating point (32bit) and Nvidia mixed precision (16bit and 32bit) while training a regression ML model. The same custom PC as per the 1st part, which was dedicated to the classification testing and analysis, was built to perform the experiments, and different ML hyper-parameters, such as batch size, neurons, and epochs, were chosen to build Deep Neural Networks (DNN). A benchmarking test with default hyper-parameter values for the DNN was used as a reference, while the experiments used a combination of different settings. The results were recorded in Excel, and descriptive statistics were chosen to calculate the mean between the groups and compare them using graphs and tables. The outcome was positive when using mixed precision combined with specific hyper-parameters. Compared to the benchmarking, optimising the regression models reduced the power consumption between 7 and 11 Watts. The regression results show that while mixed precision can help improve power consumption, we must carefully consider the hyper-parameters. A high number of batch sizes and neurons will negatively affect power consumption. However, this research required inferential statistics, specifically ANOVA and T-test, to compare the relationship between the means. The results reported no statistical significance between the means in the regression tests and accepted H0. Therefore, choosing different ML techniques and the Parquet dataset format will not improve the computational power consumption and the overall ML carbon footprint. However, a more extensive implementation with a cluster of GPUs can increase the sample size significantly, as it is an essential factor and can change the outcome of the statistical analysis.
本研究是我硕士论文的第二部分,比较了使用逗号分隔值(CSV)和parquet数据集格式、默认浮点(32位)和Nvidia混合精度(16位和32位)训练回归ML模型时的功耗。为了进行实验,我们构建了与第一部分相同的定制 PC,专门用于分类测试和分析,并选择了不同的 ML 超参数(如批量大小、神经元和历时)来构建深度神经网络(DNN)。使用 DNN 的默认超参数值进行基准测试作为参考,而实验则使用不同设置的组合。实验结果记录在 Excel 中,并选择描述性统计来计算各组之间的平均值,并使用图形和表格对它们进行比较。在使用混合精度和特定超参数时,结果是积极的。与基准测试相比,优化回归模型降低了 7 到 11 瓦的功耗。回归结果表明,虽然混合精度有助于改善功耗,但我们必须仔细考虑超参数。批量大小和神经元数量过多会对功耗产生负面影响。不过,这项研究需要使用推断统计学,特别是方差分析和 T 检验,来比较平均值之间的关系。结果表明,回归测试中各均值之间没有统计学意义,接受 H0。因此,选择不同的 ML 技术和 Parquet 数据集格式不会改善计算能力消耗和整体 ML 碳足迹。然而,使用 GPU 集群进行更广泛的实施可以显著增加样本量,因为样本量是一个重要因素,可以改变统计分析的结果。
{"title":"Improve Machine Learning carbon footprint using Parquet dataset format and Mixed Precision training for regression algorithms","authors":"Andrew Antonopoulos","doi":"arxiv-2409.11071","DOIUrl":"https://doi.org/arxiv-2409.11071","url":null,"abstract":"This study was the 2nd part of my dissertation for my master degree and\u0000compared the power consumption using the Comma-Separated-Values (CSV) and\u0000parquet dataset format with the default floating point (32bit) and Nvidia mixed\u0000precision (16bit and 32bit) while training a regression ML model. The same\u0000custom PC as per the 1st part, which was dedicated to the classification\u0000testing and analysis, was built to perform the experiments, and different ML\u0000hyper-parameters, such as batch size, neurons, and epochs, were chosen to build\u0000Deep Neural Networks (DNN). A benchmarking test with default hyper-parameter\u0000values for the DNN was used as a reference, while the experiments used a\u0000combination of different settings. The results were recorded in Excel, and\u0000descriptive statistics were chosen to calculate the mean between the groups and\u0000compare them using graphs and tables. The outcome was positive when using mixed\u0000precision combined with specific hyper-parameters. Compared to the\u0000benchmarking, optimising the regression models reduced the power consumption\u0000between 7 and 11 Watts. The regression results show that while mixed precision\u0000can help improve power consumption, we must carefully consider the\u0000hyper-parameters. A high number of batch sizes and neurons will negatively\u0000affect power consumption. However, this research required inferential\u0000statistics, specifically ANOVA and T-test, to compare the relationship between\u0000the means. The results reported no statistical significance between the means\u0000in the regression tests and accepted H0. Therefore, choosing different ML\u0000techniques and the Parquet dataset format will not improve the computational\u0000power consumption and the overall ML carbon footprint. However, a more\u0000extensive implementation with a cluster of GPUs can increase the sample size\u0000significantly, as it is an essential factor and can change the outcome of the\u0000statistical analysis.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261969","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ali Royat, Seyed Mohamad Moghadas, Lesley De Cruz, Adrian Munteanu
Deep neural networks (DNNs) have demonstrated remarkable performance across various domains, yet their application to temporal graph regression tasks faces significant challenges regarding interpretability. This critical issue, rooted in the inherent complexity of both DNNs and underlying spatio-temporal patterns in the graph, calls for innovative solutions. While interpretability concerns in Graph Neural Networks (GNNs) mirror those of DNNs, to the best of our knowledge, no notable work has addressed the interpretability of temporal GNNs using a combination of Information Bottleneck (IB) principles and prototype-based methods. Our research introduces a novel approach that uniquely integrates these techniques to enhance the interpretability of temporal graph regression models. The key contributions of our work are threefold: We introduce the underline{G}raph underline{IN}terpretability in underline{T}emporal underline{R}egression task using underline{I}nformation bottleneck and underline{P}rototype (GINTRIP) framework, the first combined application of IB and prototype-based methods for interpretable temporal graph tasks. We derive a novel theoretical bound on mutual information (MI), extending the applicability of IB principles to graph regression tasks. We incorporate an unsupervised auxiliary classification head, fostering multi-task learning and diverse concept representation, which enhances the model bottleneck's interpretability. Our model is evaluated on real-world traffic datasets, outperforming existing methods in both forecasting accuracy and interpretability-related metrics.
{"title":"GINTRIP: Interpretable Temporal Graph Regression using Information bottleneck and Prototype-based method","authors":"Ali Royat, Seyed Mohamad Moghadas, Lesley De Cruz, Adrian Munteanu","doi":"arxiv-2409.10996","DOIUrl":"https://doi.org/arxiv-2409.10996","url":null,"abstract":"Deep neural networks (DNNs) have demonstrated remarkable performance across\u0000various domains, yet their application to temporal graph regression tasks faces\u0000significant challenges regarding interpretability. This critical issue, rooted\u0000in the inherent complexity of both DNNs and underlying spatio-temporal patterns\u0000in the graph, calls for innovative solutions. While interpretability concerns\u0000in Graph Neural Networks (GNNs) mirror those of DNNs, to the best of our\u0000knowledge, no notable work has addressed the interpretability of temporal GNNs\u0000using a combination of Information Bottleneck (IB) principles and\u0000prototype-based methods. Our research introduces a novel approach that uniquely\u0000integrates these techniques to enhance the interpretability of temporal graph\u0000regression models. The key contributions of our work are threefold: We\u0000introduce the underline{G}raph underline{IN}terpretability in\u0000underline{T}emporal underline{R}egression task using underline{I}nformation\u0000bottleneck and underline{P}rototype (GINTRIP) framework, the first combined\u0000application of IB and prototype-based methods for interpretable temporal graph\u0000tasks. We derive a novel theoretical bound on mutual information (MI),\u0000extending the applicability of IB principles to graph regression tasks. We\u0000incorporate an unsupervised auxiliary classification head, fostering multi-task\u0000learning and diverse concept representation, which enhances the model\u0000bottleneck's interpretability. Our model is evaluated on real-world traffic\u0000datasets, outperforming existing methods in both forecasting accuracy and\u0000interpretability-related metrics.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261893","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nikhil Vyas, Depen Morwani, Rosie Zhao, Itai Shapira, David Brandfonbrener, Lucas Janson, Sham Kakade
There is growing evidence of the effectiveness of Shampoo, a higher-order preconditioning method, over Adam in deep learning optimization tasks. However, Shampoo's drawbacks include additional hyperparameters and computational overhead when compared to Adam, which only updates running averages of first- and second-moment quantities. This work establishes a formal connection between Shampoo (implemented with the 1/2 power) and Adafactor -- a memory-efficient approximation of Adam -- showing that Shampoo is equivalent to running Adafactor in the eigenbasis of Shampoo's preconditioner. This insight leads to the design of a simpler and computationally efficient algorithm: $textbf{S}$hampo$textbf{O}$ with $textbf{A}$dam in the $textbf{P}$reconditioner's eigenbasis (SOAP). With regards to improving Shampoo's computational efficiency, the most straightforward approach would be to simply compute Shampoo's eigendecomposition less frequently. Unfortunately, as our empirical results show, this leads to performance degradation that worsens with this frequency. SOAP mitigates this degradation by continually updating the running average of the second moment, just as Adam does, but in the current (slowly changing) coordinate basis. Furthermore, since SOAP is equivalent to running Adam in a rotated space, it introduces only one additional hyperparameter (the preconditioning frequency) compared to Adam. We empirically evaluate SOAP on language model pre-training with 360m and 660m sized models. In the large batch regime, SOAP reduces the number of iterations by over 40% and wall clock time by over 35% compared to AdamW, with approximately 20% improvements in both metrics compared to Shampoo. An implementation of SOAP is available at https://github.com/nikhilvyas/SOAP.
{"title":"SOAP: Improving and Stabilizing Shampoo using Adam","authors":"Nikhil Vyas, Depen Morwani, Rosie Zhao, Itai Shapira, David Brandfonbrener, Lucas Janson, Sham Kakade","doi":"arxiv-2409.11321","DOIUrl":"https://doi.org/arxiv-2409.11321","url":null,"abstract":"There is growing evidence of the effectiveness of Shampoo, a higher-order\u0000preconditioning method, over Adam in deep learning optimization tasks. However,\u0000Shampoo's drawbacks include additional hyperparameters and computational\u0000overhead when compared to Adam, which only updates running averages of first-\u0000and second-moment quantities. This work establishes a formal connection between\u0000Shampoo (implemented with the 1/2 power) and Adafactor -- a memory-efficient\u0000approximation of Adam -- showing that Shampoo is equivalent to running\u0000Adafactor in the eigenbasis of Shampoo's preconditioner. This insight leads to\u0000the design of a simpler and computationally efficient algorithm:\u0000$textbf{S}$hampo$textbf{O}$ with $textbf{A}$dam in the\u0000$textbf{P}$reconditioner's eigenbasis (SOAP). With regards to improving Shampoo's computational efficiency, the most\u0000straightforward approach would be to simply compute Shampoo's\u0000eigendecomposition less frequently. Unfortunately, as our empirical results\u0000show, this leads to performance degradation that worsens with this frequency.\u0000SOAP mitigates this degradation by continually updating the running average of\u0000the second moment, just as Adam does, but in the current (slowly changing)\u0000coordinate basis. Furthermore, since SOAP is equivalent to running Adam in a\u0000rotated space, it introduces only one additional hyperparameter (the\u0000preconditioning frequency) compared to Adam. We empirically evaluate SOAP on\u0000language model pre-training with 360m and 660m sized models. In the large batch\u0000regime, SOAP reduces the number of iterations by over 40% and wall clock time\u0000by over 35% compared to AdamW, with approximately 20% improvements in both\u0000metrics compared to Shampoo. An implementation of SOAP is available at\u0000https://github.com/nikhilvyas/SOAP.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261964","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}