arXiv - CS - Other Computer Science最新文献

英文中文

Cyber Framework for Steering and Measurements Collection Over Instrument-Computing Ecosystems 仪器计算生态系统转向和测量收集的网络框架

arXiv - CS - Other Computer Science

Pub Date : 2023-07-12 DOI: arxiv-2307.06883

Anees Al-Najjar, Nageswara S. V. Rao, Ramanan Sankaran, Helia Zandi, Debangshu Mukherjee, Maxim Ziatdinov, Craig Bridges

We propose a framework to develop cyber solutions to support the remotesteering of science instruments and measurements collection overinstrument-computing ecosystems. It is based on provisioning separate data andcontrol connections at the network level, and developing software modulesconsisting of Python wrappers for instrument commands and Pyro server-clientcodes that make them available across the ecosystem network. We demonstrateautomated measurement transfers and remote steering operations in a microscopyuse case for materials research over an ecosystem of Nion microscopes andcomputing platforms connected over site networks. The proposed framework iscurrently under further refinement and being adopted to science workflows withautomated remote experiments steering for autonomous chemistry laboratories andsmart energy grid simulations.

我们提出了一个框架来开发网络解决方案，以支持科学仪器的远程控制和仪器计算生态系统的测量收集。它基于在网络级别提供单独的数据和控制连接，以及开发由仪器命令的Python包装器和Pyro服务器-客户端代码组成的软件模块，使它们在整个生态系统网络中可用。我们展示了自动测量传输和远程转向操作的显微镜用例材料研究在一个生态系统的联合显微镜和计算平台通过现场网络连接。提出的框架目前正在进一步完善，并被采用到科学工作流程中，自动化远程实验转向自主化学实验室和智能能源网格模拟。

引用次数: 0

Towards a unified language in experimental designs propagated by a software framework 通过软件框架传播实验设计的统一语言

arXiv - CS - Other Computer Science

Pub Date : 2023-07-11 DOI: arxiv-2307.11593

Emi Tanaka

Experiments require human decisions in the design process, which in turn arereformulated and summarized as inputs into a system (computational orotherwise) to generate the experimental design. I leverage this system topromote a language of experimental designs by proposing a novel computationalframework, called "the grammar of experimental designs", to specifyexperimental designs based on an object-oriented programming system thatdeclaratively encapsulates the experimental structure. The framework aims toengage human cognition by building experimental designs with modular functionsthat modify a targeted singular element of the experimental design object. Thesyntax and semantics of the framework are built upon consideration frommultiple perspectives. While the core framework is language-agnostic, theframework is implemented in the `edibble` R-package. A range of examples isshown to demonstrate the utility of the framework.

实验需要人类在设计过程中做出决定，而这些决定又被重新表述和总结为系统(计算或其他)的输入，以生成实验设计。我利用这个系统，通过提出一个新的计算框架，称为“实验设计的语法”，来促进实验设计的语言，具体的实验设计基于一个面向对象的编程系统，声明封装实验结构。该框架旨在通过构建具有模块化功能的实验设计来参与人类认知，这些功能可以修改实验设计对象的目标单一元素。框架的语法和语义是建立在从多个角度考虑的基础上的。虽然核心框架与语言无关，但该框架是在“edibble”r包中实现的。展示了一系列示例来演示该框架的实用性。

引用次数: 0

The Impact of Process Complexity on Process Performance: A Study using Event Log Data 过程复杂性对过程性能的影响:基于事件日志数据的研究

arXiv - CS - Other Computer Science

Pub Date : 2023-07-11 DOI: arxiv-2307.06106

Maxim Vidgof, Bastian Wurm, Jan Mendling

Complexity is an important characteristic of any business process. The keyassumption of much research in Business Process Management is that processcomplexity has a negative impact on process performance. So far, behavioralstudies have measured complexity based on the perception of processstakeholders. The aim of this study is to investigate if such a connection canbe supported based on the analysis of event log data. To do so, we employ a setof 38 metrics that capture different dimensions of process complexity. We usethese metrics to build various regression models that explain processperformance in terms of throughput time. We find that process complexity ascaptured in event logs explains the throughput time of process executions to aconsiderable extent, with the respective R-squared reaching up to 0.96. Ourstudy offers implications for empirical research on process performance and canserve as a toolbox for practitioners.

复杂性是任何业务流程的重要特征。业务流程管理中许多研究的关键假设是流程复杂性对流程性能有负面影响。到目前为止，行为学研究基于过程利益相关者的感知来衡量复杂性。本研究的目的是调查基于事件日志数据的分析是否支持这种联系。为了做到这一点，我们采用了一组38个度量来捕捉过程复杂性的不同维度。我们使用这些指标来构建各种回归模型，以吞吐量时间来解释进程性能。我们发现，事件日志中捕获的流程复杂性在很大程度上解释了流程执行的吞吐量时间，相应的r平方达到0.96。我们的研究为过程性能的实证研究提供了启示，并且可以作为实践者的工具箱。

引用次数: 0

Compression Performance Analysis of Different File Formats 不同文件格式的压缩性能分析

arXiv - CS - Other Computer Science

Pub Date : 2023-07-08 DOI: arxiv-2308.12275

Han Yang, Guangjun Qin, Yongqing Hu

In data storage and transmission, file compression is a common technique forreducing the volume of data, reducing data storage space and transmission timeand bandwidth. However, there are significant differences in the compressionperformance of different types of file formats, and the benefits vary. In thispaper, 22 file formats with approximately 178GB of data were collected and theZlib algorithm was used for compression experiments to compare performance inorder to investigate the compression gains of different file types. Theexperimental results show that some file types are poorly compressed, withalmost constant file size and long compression time, resulting in lower gains;some other file types are significantly reduced in file size and compressiontime after compression, which can effectively reduce the data volume. Based onthe above experimental results, this paper will then selectively reduce thedata volume by compression in data storage and transmission for the file typesin order to obtain the maximum compression yield.

在数据存储和传输中，文件压缩是减少数据量、减少数据存储空间、减少传输时间和带宽的常用技术。但是，不同类型的文件格式在压缩性能上存在显著差异，其好处也各不相同。本文收集了22种文件格式约178GB的数据，并使用zlib算法进行压缩实验，比较性能，以研究不同文件类型的压缩收益。实验结果表明，有些文件类型压缩效果较差，文件大小几乎不变，压缩时间长，导致增益较低;有些文件类型压缩后文件大小和压缩时间明显减小，可以有效地减少数据量。在上述实验结果的基础上，本文将针对文件类型在数据存储和传输中选择性地压缩数据量，以获得最大的压缩产量。

引用次数: 0

Towards The Ultimate Brain: Exploring Scientific Discovery with ChatGPT AI 走向终极大脑:用ChatGPT AI探索科学发现

arXiv - CS - Other Computer Science

Pub Date : 2023-07-08 DOI: arxiv-2308.12400

Gerardo Adesso

This paper presents a novel approach to scientific discovery using anartificial intelligence (AI) environment known as ChatGPT, developed by OpenAI.This is the first paper entirely generated with outputs from ChatGPT. Wedemonstrate how ChatGPT can be instructed through a gamification environment todefine and benchmark hypothetical physical theories. Through this environment,ChatGPT successfully simulates the creation of a new improved model, calledGPT$^4$, which combines the concepts of GPT in AI (generative pretrainedtransformer) and GPT in physics (generalized probabilistic theory). We showthat GPT$^4$ can use its built-in mathematical and statistical capabilities tosimulate and analyze physical laws and phenomena. As a demonstration of itslanguage capabilities, GPT$^4$ also generates a limerick about itself. Overall,our results demonstrate the promising potential for human-AI collaboration inscientific discovery, as well as the importance of designing systems thateffectively integrate AI's capabilities with human intelligence.

本文提出了一种利用人工智能(AI)环境进行科学发现的新方法，该环境被称为ChatGPT，由OpenAI开发。这是第一篇完全由ChatGPT的输出生成的论文。我们演示了ChatGPT如何通过游戏化环境来定义和基准假设的物理理论。通过这个环境，ChatGPT成功地模拟了一个新的改进模型的创建，称为dgpt $^4$，它结合了人工智能中的GPT(生成预训练变压器)和物理学中的GPT(广义概率论)的概念。我们证明GPT$^4$可以使用其内置的数学和统计功能来模拟和分析物理定律和现象。作为其语言功能的演示，GPT$^4$还生成了一首关于自己的打油诗。总的来说，我们的研究结果证明了人类与人工智能在科学发现中的合作潜力，以及设计有效整合人工智能能力与人类智能的系统的重要性。

引用次数: 0

Towards Mobility Data Science (Vision Paper) 迈向移动数据科学(愿景文件)

arXiv - CS - Other Computer Science

Pub Date : 2023-06-21 DOI: arxiv-2307.05717

Mohamed MokbelUniversity of Minnesota, Minneapolis, USA, Mahmoud SakrUniversité Libre, Brussels, Belgium, Li XiongEmory University, Atlanta, USA, Andreas ZüfleEmory University, Atlanta, USA, Jussara AlmeidaFederal University of Minas Gerais, Belo Horizonte, Brazil, Walid ArefPurdue University, West Lafayette, USA, Gennady AndrienkoFraunhofer IAIS, St. Augustin, Germany, Natalia AndrienkoFraunhofer IAIS, St. Augustin, Germany, Yang CaoKyoto University, Kyoto, Japan, Sanjay ChawlaQatar Computing Research Institute, Doha, Qatar, Reynold ChengUniversity of Hong Kong, Hong Kong, China, Panos ChrysanthisUniversity of Pittsburgh, Pennsylvania, USA, Xiqi FeiGeorge Mason University, Fairfax, USA, Gabriel GhinitaUniversity of Massachusetts at Boston, Boston, USA, Anita GraserAustrian Institute of Technology, Vienna, Austria, Dimitrios GunopulosUniversity of Athens, Greece, Christian JensenAalborg University, Denmark, Joon-Sook KimOak Ridge National Laboratory, USA, Kyoung-Sook KimAIST, Tokyo Waterfront, Japan, Peer KrögerUniversity of Kiel, Germany, John KrummUniversity of Southern California, Log Angeles, USA, Johannes LauerHERE Technologies, Germany, Amr MagdyUniversity of California, Riverside, USA, Mario NascimentoNortheastern University, Boston, USA, Siva RavadaOracle Corp., Nashua, USA, Matthias RenzUniversity of Kiel, Germany, Dimitris SacharidisUniversité Libre, Brussels, Belgium, Cyrus ShahabiUniversity of Southern California, Log Angeles, USA, Flora SalimUniversity of New South Wales, Sydney, Australia, Mohamed SarwatArizona State University, Tempe, Maxime SchoemansUniversité Libre, Brussels, Belgium, Bettina SpeckmannTU Eindhoven, Netherlands, Egemen TaninUniversity of Melbourne, Australia, Yannis TheodoridisUniversity of Piraeus, Greece, Kristian TorpAalborg University, Denmark, Goce TrajcevskiIowa State University, Ames, USA, Marc van KreveldUtrecht University, Netherlands, Carola WenkTulane University, New Orleans, USA, Martin WernerTechnical University of Munich, Munich, Germany, Raymond WongHong Kong University of Science and Technology, Hong Kong, China, Song WuUniversité Libre, Brussels, Belgium, Jianqiu XuNanjing University of Aeronautics and Astronautics, Nanjing, China, Moustafa YoussefAUC and Alexandria University, Egypt, Demetris ZeinalipourUniversity of Cyprus, Nicosia, Cyprus, Mengxuan ZhangIowa State University, Ames, USA, Esteban ZimányiUniversité Libre, Brussels, Belgium

Mobility data captures the locations of moving objects such as humans,animals, and cars. With the availability of GPS-equipped mobile devices andother inexpensive location-tracking technologies, mobility data is collectedubiquitously. In recent years, the use of mobility data has demonstratedsignificant impact in various domains including traffic management, urbanplanning, and health sciences. In this paper, we present the emerging domain ofmobility data science. Towards a unified approach to mobility data science, weenvision a pipeline having the following components: mobility data collection,cleaning, analysis, management, and privacy. For each of these components, weexplain how mobility data science differs from general data science, we surveythe current state of the art and describe open challenges for the researchcommunity in the coming years.

移动数据捕获移动对象(如人、动物和汽车)的位置。随着配备gps的移动设备的可用性和其他便宜的位置跟踪技术，移动数据被无处不在地收集。近年来，移动数据的使用在交通管理、城市规划和健康科学等各个领域产生了重大影响。在本文中，我们介绍了移动数据科学的新兴领域。为了实现移动数据科学的统一方法，我们设想了一个包含以下组件的管道:移动数据收集、清理、分析、管理和隐私。对于这些组成部分，我们解释了移动数据科学与一般数据科学的不同之处，我们调查了当前的艺术状态，并描述了未来几年研究界面临的开放挑战。

{"title":"Towards Mobility Data Science (Vision Paper)","authors":"Mohamed MokbelUniversity of Minnesota, Minneapolis, USA, Mahmoud SakrUniversité Libre, Brussels, Belgium, Li XiongEmory University, Atlanta, USA, Andreas ZüfleEmory University, Atlanta, USA, Jussara AlmeidaFederal University of Minas Gerais, Belo Horizonte, Brazil, Walid ArefPurdue University, West Lafayette, USA, Gennady AndrienkoFraunhofer IAIS, St. Augustin, Germany, Natalia AndrienkoFraunhofer IAIS, St. Augustin, Germany, Yang CaoKyoto University, Kyoto, Japan, Sanjay ChawlaQatar Computing Research Institute, Doha, Qatar, Reynold ChengUniversity of Hong Kong, Hong Kong, China, Panos ChrysanthisUniversity of Pittsburgh, Pennsylvania, USA, Xiqi FeiGeorge Mason University, Fairfax, USA, Gabriel GhinitaUniversity of Massachusetts at Boston, Boston, USA, Anita GraserAustrian Institute of Technology, Vienna, Austria, Dimitrios GunopulosUniversity of Athens, Greece, Christian JensenAalborg University, Denmark, Joon-Sook KimOak Ridge National Laboratory, USA, Kyoung-Sook KimAIST, Tokyo Waterfront, Japan, Peer KrögerUniversity of Kiel, Germany, John KrummUniversity of Southern California, Log Angeles, USA, Johannes LauerHERE Technologies, Germany, Amr MagdyUniversity of California, Riverside, USA, Mario NascimentoNortheastern University, Boston, USA, Siva RavadaOracle Corp., Nashua, USA, Matthias RenzUniversity of Kiel, Germany, Dimitris SacharidisUniversité Libre, Brussels, Belgium, Cyrus ShahabiUniversity of Southern California, Log Angeles, USA, Flora SalimUniversity of New South Wales, Sydney, Australia, Mohamed SarwatArizona State University, Tempe, Maxime SchoemansUniversité Libre, Brussels, Belgium, Bettina SpeckmannTU Eindhoven, Netherlands, Egemen TaninUniversity of Melbourne, Australia, Yannis TheodoridisUniversity of Piraeus, Greece, Kristian TorpAalborg University, Denmark, Goce TrajcevskiIowa State University, Ames, USA, Marc van KreveldUtrecht University, Netherlands, Carola WenkTulane University, New Orleans, USA, Martin WernerTechnical University of Munich, Munich, Germany, Raymond WongHong Kong University of Science and Technology, Hong Kong, China, Song WuUniversité Libre, Brussels, Belgium, Jianqiu XuNanjing University of Aeronautics and Astronautics, Nanjing, China, Moustafa YoussefAUC and Alexandria University, Egypt, Demetris ZeinalipourUniversity of Cyprus, Nicosia, Cyprus, Mengxuan ZhangIowa State University, Ames, USA, Esteban ZimányiUniversité Libre, Brussels, Belgium","doi":"arxiv-2307.05717","DOIUrl":"https://doi.org/arxiv-2307.05717","url":null,"abstract":"Mobility data captures the locations of moving objects such as humans,\u0000animals, and cars. With the availability of GPS-equipped mobile devices and\u0000other inexpensive location-tracking technologies, mobility data is collected\u0000ubiquitously. In recent years, the use of mobility data has demonstrated\u0000significant impact in various domains including traffic management, urban\u0000planning, and health sciences. In this paper, we present the emerging domain of\u0000mobility data science. Towards a unified approach to mobility data science, we\u0000envision a pipeline having the following components: mobility data collection,\u0000cleaning, analysis, management, and privacy. For each of these components, we\u0000explain how mobility data science differs from general data science, we survey\u0000the current state of the art and describe open challenges for the research\u0000community in the coming years.","PeriodicalId":501310,"journal":{"name":"arXiv - CS - Other Computer Science","volume":"15 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138522102","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Enjeux de communication dans la multirepr{é}sentation cartographique reproductible 可重复的多代表性地图绘制中的通信问题

arXiv - CS - Other Computer Science

Pub Date : 2023-06-19 DOI: arxiv-2306.10862

Nicolas LambertRIATE, CNRS, Timothée GiraudRIATE, CNRS, Ronan YsebaertRIATE, UPCité

This chapter deepens cartographic communication through a cartographicmultirepresentation exercise. Using a single dataset on World population data,the chapter presents a series of 13 different maps to illustrate how mapping isprimarily a matter of choices and methods.

本章通过制图多表示练习加深制图交流。本章使用世界人口数据的单一数据集，展示了一系列13种不同的地图，以说明地图绘制主要是选择和方法的问题。

引用次数: 0

An Interdisciplinary Survey on Origin-destination Flows Modeling: Theory and Techniques 起点-终点流动模型的跨学科研究:理论与技术

arXiv - CS - Other Computer Science

Pub Date : 2023-06-12 DOI: arxiv-2306.10048

Can Rong, Jingtao Ding, Yong Li

Origin-destination~(OD) flow modeling is an extensively researched subjectacross multiple disciplines, such as the investigation of travel demand intransportation and spatial interaction modeling in geography. However,researchers from different fields tend to employ their own unique researchparadigms and lack interdisciplinary communication, preventing thecross-fertilization of knowledge and the development of novel solutions tochallenges. This article presents a systematic interdisciplinary survey thatcomprehensively and holistically scrutinizes OD flows from utilizingfundamental theory to studying the mechanism of population mobility and solvingpractical problems with engineering techniques, such as computational models.Specifically, regional economics, urban geography, and sociophysics are adeptat employing theoretical research methods to explore the underlying mechanismsof OD flows. They have developed three influential theoretical models: thegravity model, the intervening opportunities model, and the radiation model.These models specifically focus on examining the fundamental influences ofdistance, opportunities, and population on OD flows, respectively. In themeantime, fields such as transportation, urban planning, and computer scienceprimarily focus on addressing four practical problems: OD prediction, ODconstruction, OD estimation, and OD forecasting. Advanced computational models,such as deep learning models, have gradually been introduced to address theseproblems more effectively. Finally, based on the existing research, this surveysummarizes current challenges and outlines future directions for this topic.Through this survey, we aim to break down the barriers between disciplines inOD flow-related research, fostering interdisciplinary perspectives and modes ofthinking.

起点-目的地流模型是一个广泛研究的跨学科课题，如交通运输中的出行需求研究和地理学中的空间相互作用模型。然而，不同领域的研究人员往往采用自己独特的研究范式，缺乏跨学科的交流，阻碍了知识的交叉施肥和挑战的新解决方案的发展。本文提出了一个系统的跨学科调查，从利用基础理论到研究人口流动机制，再到用计算模型等工程技术解决实际问题，全面、整体地审视人口流动。具体而言，区域经济学、城市地理学和社会物理学擅长运用理论研究方法来探索OD流动的潜在机制。他们发展了三种有影响力的理论模型:引力模型、介入机会模型和辐射模型。这些模型分别着重于考察距离、机会和人口对OD流动的基本影响。与此同时，交通、城市规划和计算机科学等领域主要关注四个实际问题:OD预测、OD构建、OD估计和OD预测。先进的计算模型，如深度学习模型，已经逐渐被引入来更有效地解决这些问题。最后，在现有研究的基础上，本调查总结了当前面临的挑战，并概述了本课题的未来发展方向。通过这次调查，我们的目标是打破学科之间的障碍，在与流动相关的研究，培养跨学科的观点和思维模式。

{"title":"An Interdisciplinary Survey on Origin-destination Flows Modeling: Theory and Techniques","authors":"Can Rong, Jingtao Ding, Yong Li","doi":"arxiv-2306.10048","DOIUrl":"https://doi.org/arxiv-2306.10048","url":null,"abstract":"Origin-destination~(OD) flow modeling is an extensively researched subject\u0000across multiple disciplines, such as the investigation of travel demand in\u0000transportation and spatial interaction modeling in geography. However,\u0000researchers from different fields tend to employ their own unique research\u0000paradigms and lack interdisciplinary communication, preventing the\u0000cross-fertilization of knowledge and the development of novel solutions to\u0000challenges. This article presents a systematic interdisciplinary survey that\u0000comprehensively and holistically scrutinizes OD flows from utilizing\u0000fundamental theory to studying the mechanism of population mobility and solving\u0000practical problems with engineering techniques, such as computational models.\u0000Specifically, regional economics, urban geography, and sociophysics are adept\u0000at employing theoretical research methods to explore the underlying mechanisms\u0000of OD flows. They have developed three influential theoretical models: the\u0000gravity model, the intervening opportunities model, and the radiation model.\u0000These models specifically focus on examining the fundamental influences of\u0000distance, opportunities, and population on OD flows, respectively. In the\u0000meantime, fields such as transportation, urban planning, and computer science\u0000primarily focus on addressing four practical problems: OD prediction, OD\u0000construction, OD estimation, and OD forecasting. Advanced computational models,\u0000such as deep learning models, have gradually been introduced to address these\u0000problems more effectively. Finally, based on the existing research, this survey\u0000summarizes current challenges and outlines future directions for this topic.\u0000Through this survey, we aim to break down the barriers between disciplines in\u0000OD flow-related research, fostering interdisciplinary perspectives and modes of\u0000thinking.","PeriodicalId":501310,"journal":{"name":"arXiv - CS - Other Computer Science","volume":"28 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138522019","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Robo Sapiens 无袖长衫智人

arXiv - CS - Other Computer Science

Pub Date : 2023-06-05 DOI: arxiv-2310.08323

Chaim Ash, Amelia Hans

This paper proposes a new method of natural language acquisition for robotsthat does not require the conversion of speech to text. Folks'Talks employsvoice2voice technology that enables a robot to understand the meaning of whatit is told and to have the ability to learn and understand new languages -inclusive of accent, dialect, and physiological differences. To do this, soundprocessing and computer vision are incorporated to give the robot a sense ofspatiotemporal causality. The "language model" we are proposing equips a robotto imitate a natural speaker's conversational behavior by thinking contextuallyand articulating its surroundings.

本文提出了一种不需要将语音转换为文本的机器人自然语言习得新方法。“乡亲谈话”采用语音技术，使机器人能够理解被告知的内容，并具有学习和理解新语言的能力——包括口音、方言和生理差异。为了做到这一点，声音处理和计算机视觉被结合起来，给机器人一种时空因果关系的感觉。我们提出的“语言模型”使机器人能够通过上下文思考和表达周围环境来模仿自然说话者的对话行为。

引用次数: 0

Generating Private Synthetic Data with Genetic Algorithms 用遗传算法生成私有合成数据

arXiv - CS - Other Computer Science

Pub Date : 2023-06-05 DOI: arxiv-2306.03257

Terrance Liu, Jingwu Tang, Giuseppe Vietri, Zhiwei Steven Wu

We study the problem of efficiently generating differentially privatesynthetic data that approximate the statistical properties of an underlyingsensitive dataset. In recent years, there has been a growing line of work thatapproaches this problem using first-order optimization techniques. However,such techniques are restricted to optimizing differentiable objectives only,severely limiting the types of analyses that can be conducted. For example,first-order mechanisms have been primarily successful in approximatingstatistical queries only in the form of marginals for discrete data domains. Insome cases, one can circumvent such issues by relaxing the task's objective tomaintain differentiability. However, even when possible, these approachesimpose a fundamental limitation in which modifications to the minimizationproblem become additional sources of error. Therefore, we propose Private-GSD,a private genetic algorithm based on zeroth-order optimization heuristics thatdo not require modifying the original objective. As a result, it avoids theaforementioned limitations of first-order optimization. We empirically evaluatePrivate-GSD against baseline algorithms on data derived from the AmericanCommunity Survey across a variety of statistics--otherwise known as statisticalqueries--both for discrete and real-valued attributes. We show that Private-GSDoutperforms the state-of-the-art methods on non-differential queries whilematching accuracy in approximating differentiable ones.

我们研究了有效地生成近似底层敏感数据集的统计属性的差分私有合成数据的问题。近年来，有越来越多的研究使用一阶优化技术来解决这个问题。然而，这些技术仅限于优化可微分目标，严重限制了可以进行的分析类型。例如，一阶机制主要成功地近似于离散数据域的边际形式的统计查询。在某些情况下，可以通过放松任务的目标来保持可微分性来规避这些问题。然而，即使在可能的情况下，这些方法也有一个基本的限制，即对最小化问题的修改成为额外的误差来源。因此，我们提出了private - gsd，一种不需要修改原始目标的基于零阶优化启发式的私有遗传算法。因此，它避免了上述一阶优化的局限性。我们根据来自美国社区调查(AmericanCommunity Survey)的各种统计数据(也称为统计查询)的基线算法对private - gsd进行了经验评估，这些数据包括离散和实值属性。我们表明private - gsd在非微分查询上优于最先进的方法，同时在近似可微分查询时匹配精度。

{"title":"Generating Private Synthetic Data with Genetic Algorithms","authors":"Terrance Liu, Jingwu Tang, Giuseppe Vietri, Zhiwei Steven Wu","doi":"arxiv-2306.03257","DOIUrl":"https://doi.org/arxiv-2306.03257","url":null,"abstract":"We study the problem of efficiently generating differentially private\u0000synthetic data that approximate the statistical properties of an underlying\u0000sensitive dataset. In recent years, there has been a growing line of work that\u0000approaches this problem using first-order optimization techniques. However,\u0000such techniques are restricted to optimizing differentiable objectives only,\u0000severely limiting the types of analyses that can be conducted. For example,\u0000first-order mechanisms have been primarily successful in approximating\u0000statistical queries only in the form of marginals for discrete data domains. In\u0000some cases, one can circumvent such issues by relaxing the task's objective to\u0000maintain differentiability. However, even when possible, these approaches\u0000impose a fundamental limitation in which modifications to the minimization\u0000problem become additional sources of error. Therefore, we propose Private-GSD,\u0000a private genetic algorithm based on zeroth-order optimization heuristics that\u0000do not require modifying the original objective. As a result, it avoids the\u0000aforementioned limitations of first-order optimization. We empirically evaluate\u0000Private-GSD against baseline algorithms on data derived from the American\u0000Community Survey across a variety of statistics--otherwise known as statistical\u0000queries--both for discrete and real-valued attributes. We show that Private-GSD\u0000outperforms the state-of-the-art methods on non-differential queries while\u0000matching accuracy in approximating differentiable ones.","PeriodicalId":501310,"journal":{"name":"arXiv - CS - Other Computer Science","volume":"238 4","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138522103","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

arXiv - CS - Other Computer Science

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀