首页 > 最新文献

arXiv - CS - General Literature最新文献

英文 中文
A survey study of success factors in data science projects 数据科学项目成功因素的调查研究
Pub Date : 2022-01-17 DOI: arxiv-2201.06310
Iñigo Martinez, Elisabeth Viles, Igor G. Olaizola
In recent years, the data science community has pursued excellence and madesignificant research efforts to develop advanced analytics, focusing on solvingtechnical problems at the expense of organizational and socio-technicalchallenges. According to previous surveys on the state of data science projectmanagement, there is a significant gap between technical and organizationalprocesses. In this article we present new empirical data from a survey to 237data science professionals on the use of project management methodologies fordata science. We provide additional profiling of the survey respondents' rolesand their priorities when executing data science projects. Based on this surveystudy, the main findings are: (1) Agile data science lifecycle is the mostwidely used framework, but only 25% of the survey participants state to followa data science project methodology. (2) The most important success factors areprecisely describing stakeholders' needs, communicating the results toend-users, and team collaboration and coordination. (3) Professionals whoadhere to a project methodology place greater emphasis on the project'spotential risks and pitfalls, version control, the deployment pipeline toproduction, and data security and privacy.
近年来,数据科学界一直在追求卓越,并在开发高级分析方面做出了重大的研究努力,以牺牲组织和社会技术挑战为代价,专注于解决技术问题。根据之前对数据科学项目管理状态的调查,技术过程和组织过程之间存在着显著的差距。在这篇文章中,我们提出了对237名数据科学专业人员使用项目管理方法进行数据科学调查的新经验数据。我们提供了调查对象在执行数据科学项目时的角色和优先级的额外分析。基于这项调查研究,主要发现是:(1)敏捷数据科学生命周期是最广泛使用的框架,但只有25%的调查参与者表示遵循数据科学项目方法。(2)最重要的成功因素是准确描述利益相关者的需求,将结果传达给最终用户,以及团队的协作和协调。(3)坚持项目方法论的专业人员更加强调项目的潜在风险和缺陷、版本控制、从部署到生产的管道,以及数据安全和隐私。
{"title":"A survey study of success factors in data science projects","authors":"Iñigo Martinez, Elisabeth Viles, Igor G. Olaizola","doi":"arxiv-2201.06310","DOIUrl":"https://doi.org/arxiv-2201.06310","url":null,"abstract":"In recent years, the data science community has pursued excellence and made\u0000significant research efforts to develop advanced analytics, focusing on solving\u0000technical problems at the expense of organizational and socio-technical\u0000challenges. According to previous surveys on the state of data science project\u0000management, there is a significant gap between technical and organizational\u0000processes. In this article we present new empirical data from a survey to 237\u0000data science professionals on the use of project management methodologies for\u0000data science. We provide additional profiling of the survey respondents' roles\u0000and their priorities when executing data science projects. Based on this survey\u0000study, the main findings are: (1) Agile data science lifecycle is the most\u0000widely used framework, but only 25% of the survey participants state to follow\u0000a data science project methodology. (2) The most important success factors are\u0000precisely describing stakeholders' needs, communicating the results to\u0000end-users, and team collaboration and coordination. (3) Professionals who\u0000adhere to a project methodology place greater emphasis on the project's\u0000potential risks and pitfalls, version control, the deployment pipeline to\u0000production, and data security and privacy.","PeriodicalId":501533,"journal":{"name":"arXiv - CS - General Literature","volume":"9 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138544608","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Data Science in Perspective 透视数据科学
Pub Date : 2022-01-15 DOI: arxiv-2201.05852
Rogerio Rossi
Data and Science has stood out in the generation of results, whether in theprojects of the scientific domain or business domain. CERN Project, ScientificInstitutes, companies like Walmart, Google, Apple, among others, need data topresent their results and make predictions in the competitive data world. Dataand Science are words that together culminated in a globally recognized termcalled Data Science. Data Science is in its initial phase, possibly being partof formal sciences and also being presented as part of applied sciences,capable of generating value and supporting decision making. Data Scienceconsiders science and, consequently, the scientific method to promote decisionmaking through data intelligence. In many cases, the application of the method(or part of it) is considered in Data Science projects in scientific domain(social sciences, bioinformatics, geospatial projects) or business domain(finance, logistic, retail), among others. In this sense, this articleaddresses the perspectives of Data Science as a multidisciplinary area,considering science and the scientific method, and its formal structure whichintegrate Statistics, Computer Science, and Business Science, also taking intoaccount Artificial Intelligence, emphasizing Machine Learning, among others.The article also deals with the perspective of applied Data Science, since DataScience is used for generating value through scientific and business projects.Data Science persona is also discussed in the article, concerning the educationof Data Science professionals and its corresponding profiles, since itsprojection changes the field of data in the world.
无论是在科学领域还是在商业领域的项目中,Data and Science都在成果的产生中脱颖而出。欧洲核子研究中心项目、科学研究所、沃尔玛、谷歌、苹果等公司都需要数据来展示他们的结果,并在竞争激烈的数据世界中做出预测。“数据”和“科学”这两个词结合在一起,形成了一个全球公认的术语——“数据科学”。数据科学正处于初始阶段,可能是正式科学的一部分,也可能是应用科学的一部分,能够产生价值并支持决策。数据科学考虑科学,因此,通过数据智能促进决策的科学方法。在许多情况下,在科学领域(社会科学,生物信息学,地理空间项目)或商业领域(金融,物流,零售)等数据科学项目中考虑该方法(或其部分)的应用。从这个意义上说,本文将数据科学的观点视为一个多学科领域,考虑到科学和科学方法,以及整合统计学,计算机科学和商业科学的正式结构,也考虑到人工智能,强调机器学习等。本文还讨论了应用数据科学的观点,因为数据科学用于通过科学和商业项目创造价值。本文还讨论了数据科学角色,涉及数据科学专业人员的教育及其相应的概况,因为它的投影改变了世界上的数据领域。
{"title":"Data Science in Perspective","authors":"Rogerio Rossi","doi":"arxiv-2201.05852","DOIUrl":"https://doi.org/arxiv-2201.05852","url":null,"abstract":"Data and Science has stood out in the generation of results, whether in the\u0000projects of the scientific domain or business domain. CERN Project, Scientific\u0000Institutes, companies like Walmart, Google, Apple, among others, need data to\u0000present their results and make predictions in the competitive data world. Data\u0000and Science are words that together culminated in a globally recognized term\u0000called Data Science. Data Science is in its initial phase, possibly being part\u0000of formal sciences and also being presented as part of applied sciences,\u0000capable of generating value and supporting decision making. Data Science\u0000considers science and, consequently, the scientific method to promote decision\u0000making through data intelligence. In many cases, the application of the method\u0000(or part of it) is considered in Data Science projects in scientific domain\u0000(social sciences, bioinformatics, geospatial projects) or business domain\u0000(finance, logistic, retail), among others. In this sense, this article\u0000addresses the perspectives of Data Science as a multidisciplinary area,\u0000considering science and the scientific method, and its formal structure which\u0000integrate Statistics, Computer Science, and Business Science, also taking into\u0000account Artificial Intelligence, emphasizing Machine Learning, among others.\u0000The article also deals with the perspective of applied Data Science, since Data\u0000Science is used for generating value through scientific and business projects.\u0000Data Science persona is also discussed in the article, concerning the education\u0000of Data Science professionals and its corresponding profiles, since its\u0000projection changes the field of data in the world.","PeriodicalId":501533,"journal":{"name":"arXiv - CS - General Literature","volume":"114 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138544611","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Data science to investigate temperature profiles of large networks of food refrigeration systems 数据科学研究大型食品冷藏系统网络的温度分布
Pub Date : 2022-01-05 DOI: arxiv-2201.02046
Corneliu Arsene
The electrical generation and transmission infrastructures of many countriesare under increased pressure. This partially reflects the move towards lowcarbon economies and the increased reliance on renewable power generationsystems. There has been a reduction in the use of traditional fossil fuelgeneration systems, which provide a stable base load, and this has beenreplaced with more unpredictable renewable generation. As a consequence, theavailable load on the grid is becoming more unstable. To cope with thisvariability, the UK National Grid has placed emphasis on the investigation ofvarious technical mechanisms (e.g. implementation of smart grids, energystorage technologies, auxiliary power sources), which may be able to preventcritical situations, when the grid may become sometimes unstable. Thesuccessful implementation of these mechanisms may require large numbers ofelectrical consumers (e.g. HVAC systems, food refrigeration systems) forexample to make additional investments in energy storage technologies (foodrefrigeration systems) or to integrate their electrical demand from industrialprocesses into the National Grid (HVAC systems). However, in the situation offood refrigeration systems, during these critical situations, even if thethermal inertia within refrigeration systems may maintain effective performanceof the device for a short period of time (e.g. under 1 minute) when theelectrical input load into the system is reduced, this still carries theparamount risk of food safety even for very short periods of time (e.g. under 1minute). Therefore before considering any future actions (e.g. investing inenergy storage technologies) to prevent the critical situations when gridbecomes unstable, it is also needed to understand during the normal use how thetemperature profiles evolve along the time inside these massive networks offood refrigeration systems.
许多国家的发电和输电基础设施承受着越来越大的压力。这在一定程度上反映了低碳经济的发展和对可再生能源发电系统的日益依赖。提供稳定基本负荷的传统化石燃料发电系统的使用已经减少,取而代之的是更不可预测的可再生能源发电。因此,电网上的可用负荷变得越来越不稳定。为了应对这种可变性,英国国家电网已经把重点放在各种技术机制的研究上(例如,智能电网的实施,储能技术,辅助电源),当电网有时可能变得不稳定时,这些技术机制可能能够预防危急情况。这些机制的成功实施可能需要大量的电力消费者(例如暖通空调系统,食品制冷系统),例如对储能技术(食品制冷系统)进行额外投资,或者将工业过程的电力需求整合到国家电网(暖通空调系统)中。然而,在食品制冷系统的情况下,在这些关键情况下,即使制冷系统内的热惯性可以在短时间内(例如在1分钟内)保持设备的有效性能,当系统的电输入负载减少时,即使在很短的时间内(例如在1分钟内),这仍然会带来食品安全的最大风险。因此,在考虑任何未来的行动(例如投资能源储存技术)以防止电网变得不稳定时的关键情况之前,还需要了解在正常使用期间,这些大型食品制冷系统网络内的温度分布是如何随时间演变的。
{"title":"Data science to investigate temperature profiles of large networks of food refrigeration systems","authors":"Corneliu Arsene","doi":"arxiv-2201.02046","DOIUrl":"https://doi.org/arxiv-2201.02046","url":null,"abstract":"The electrical generation and transmission infrastructures of many countries\u0000are under increased pressure. This partially reflects the move towards low\u0000carbon economies and the increased reliance on renewable power generation\u0000systems. There has been a reduction in the use of traditional fossil fuel\u0000generation systems, which provide a stable base load, and this has been\u0000replaced with more unpredictable renewable generation. As a consequence, the\u0000available load on the grid is becoming more unstable. To cope with this\u0000variability, the UK National Grid has placed emphasis on the investigation of\u0000various technical mechanisms (e.g. implementation of smart grids, energy\u0000storage technologies, auxiliary power sources), which may be able to prevent\u0000critical situations, when the grid may become sometimes unstable. The\u0000successful implementation of these mechanisms may require large numbers of\u0000electrical consumers (e.g. HVAC systems, food refrigeration systems) for\u0000example to make additional investments in energy storage technologies (food\u0000refrigeration systems) or to integrate their electrical demand from industrial\u0000processes into the National Grid (HVAC systems). However, in the situation of\u0000food refrigeration systems, during these critical situations, even if the\u0000thermal inertia within refrigeration systems may maintain effective performance\u0000of the device for a short period of time (e.g. under 1 minute) when the\u0000electrical input load into the system is reduced, this still carries the\u0000paramount risk of food safety even for very short periods of time (e.g. under 1\u0000minute). Therefore before considering any future actions (e.g. investing in\u0000energy storage technologies) to prevent the critical situations when grid\u0000becomes unstable, it is also needed to understand during the normal use how the\u0000temperature profiles evolve along the time inside these massive networks of\u0000food refrigeration systems.","PeriodicalId":501533,"journal":{"name":"arXiv - CS - General Literature","volume":"29 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138544610","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Harmonic numbers as the summation of integrals 调和数作为积分的总和
Pub Date : 2021-12-01 DOI: arxiv-2112.00257
N. Karjanto
Harmonic numbers arise from the truncation of the harmonic series. The$n^text{th}$ harmonic number is the sum of the reciprocals of each positiveinteger up to $n$. In addition to briefly introducing the properties ofharmonic numbers, we cover harmonic numbers as the summation of integrals thatinvolve the product of exponential and hyperbolic secant functions. The proofis relatively simple since it only comprises the Principle of MathematicalInduction and integration by parts.
谐波数产生于谐波级数的截断。$n^text{th}$调和数是$n$以下的每个正整数的倒数之和。除了简要介绍调和数的性质外,我们还将调和数作为涉及指数函数和双曲正割函数乘积的积分和。这个证明是比较简单的,因为它只包含数学归纳法原理和分部积分法。
{"title":"Harmonic numbers as the summation of integrals","authors":"N. Karjanto","doi":"arxiv-2112.00257","DOIUrl":"https://doi.org/arxiv-2112.00257","url":null,"abstract":"Harmonic numbers arise from the truncation of the harmonic series. The\u0000$n^text{th}$ harmonic number is the sum of the reciprocals of each positive\u0000integer up to $n$. In addition to briefly introducing the properties of\u0000harmonic numbers, we cover harmonic numbers as the summation of integrals that\u0000involve the product of exponential and hyperbolic secant functions. The proof\u0000is relatively simple since it only comprises the Principle of Mathematical\u0000Induction and integration by parts.","PeriodicalId":501533,"journal":{"name":"arXiv - CS - General Literature","volume":"2 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138544605","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Taxonomy of Anomalies in Log Data 测井数据异常分类
Pub Date : 2021-11-26 DOI: arxiv-2111.13462
Thorsten Wittkopp, Philipp Wiesner, Dominik Scheinert, Odej Kao
Log data anomaly detection is a core component in the area of artificialintelligence for IT operations. However, the large amount of existing methodsmakes it hard to choose the right approach for a specific system. A betterunderstanding of different kinds of anomalies, and which algorithms aresuitable for detecting them, would support researchers and IT operators.Although a common taxonomy for anomalies already exists, it has not yet beenapplied specifically to log data, pointing out the characteristics andpeculiarities in this domain. In this paper, we present a taxonomy for different kinds of log dataanomalies and introduce a method for analyzing such anomalies in labeleddatasets. We applied our taxonomy to the three common benchmark datasetsThunderbird, Spirit, and BGL, and trained five state-of-the-art unsupervisedanomaly detection algorithms to evaluate their performance in detectingdifferent kinds of anomalies. Our results show, that the most common anomalytype is also the easiest to predict. Moreover, deep learning-based approachesoutperform data mining-based approaches in all anomaly types, but especiallywhen it comes to detecting contextual anomalies.
日志数据异常检测是人工智能IT运营领域的核心组成部分。然而,现有的大量方法使得很难为特定的系统选择正确的方法。更好地了解不同类型的异常,以及适合检测它们的算法,将为研究人员和IT操作员提供支持。虽然已有一种常见的异常分类方法,但尚未将其具体应用于测井数据,指出了该领域的特点和特殊性。在本文中,我们提出了不同类型的测井数据异常的分类,并介绍了一种在标记数据集中分析这种异常的方法。我们将我们的分类法应用于三个常见的基准数据集——thunderbird、Spirit和BGL,并训练了五种最先进的无监督异常检测算法,以评估它们在检测不同类型异常方面的性能。我们的结果表明,最常见的异常类型也是最容易预测的。此外,基于深度学习的方法在所有异常类型中都优于基于数据挖掘的方法,尤其是在检测上下文异常时。
{"title":"A Taxonomy of Anomalies in Log Data","authors":"Thorsten Wittkopp, Philipp Wiesner, Dominik Scheinert, Odej Kao","doi":"arxiv-2111.13462","DOIUrl":"https://doi.org/arxiv-2111.13462","url":null,"abstract":"Log data anomaly detection is a core component in the area of artificial\u0000intelligence for IT operations. However, the large amount of existing methods\u0000makes it hard to choose the right approach for a specific system. A better\u0000understanding of different kinds of anomalies, and which algorithms are\u0000suitable for detecting them, would support researchers and IT operators.\u0000Although a common taxonomy for anomalies already exists, it has not yet been\u0000applied specifically to log data, pointing out the characteristics and\u0000peculiarities in this domain. In this paper, we present a taxonomy for different kinds of log data\u0000anomalies and introduce a method for analyzing such anomalies in labeled\u0000datasets. We applied our taxonomy to the three common benchmark datasets\u0000Thunderbird, Spirit, and BGL, and trained five state-of-the-art unsupervised\u0000anomaly detection algorithms to evaluate their performance in detecting\u0000different kinds of anomalies. Our results show, that the most common anomaly\u0000type is also the easiest to predict. Moreover, deep learning-based approaches\u0000outperform data mining-based approaches in all anomaly types, but especially\u0000when it comes to detecting contextual anomalies.","PeriodicalId":501533,"journal":{"name":"arXiv - CS - General Literature","volume":"13 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138544607","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Review on Analysis and Visualization Methods for Biclustering 双聚类分析与可视化方法综述
Pub Date : 2021-11-23 DOI: arxiv-2111.12154
Melih Sozdinler
Recently, biclustering is one of the hot topics in bioinformatics and takesthe attention of authors from several different disciplines. Hence, manydifferent methodologies from a variety of disciplines are proposed as asolution to the biclustering problem. As a consequence of this issue, a varietyof solutions makes it harder to evaluate the proposed methods. With this reviewpaper, we are aimed to discuss both analysis and visualization of biclusteringas a guide for the comparisons between brand new and existing biclusteringalgorithms. Additionally, we concentrate on the tools that providevisualizations with accompanied analysis techniques. Through the paper, we giveseveral references that are also a short review of the state of the art for theones who will pursue research on biclustering. The Paper outline is as follows;we first give the visualization and analysis methods, then we evaluate eachproposed tool with the visualization contribution and analysis options,finally, we discuss future directions for biclustering and we propose standardsfor future work.
双聚类是近年来生物信息学研究的热点之一,引起了各学科学者的广泛关注。因此,来自不同学科的许多不同的方法被提出作为双聚类问题的解决方案。这个问题的结果是,各种各样的解决方案使得评估所建议的方法变得更加困难。在这篇综述中,我们的目的是讨论双聚类的分析和可视化,作为比较新的和现有的双聚类算法的指南。此外,我们将重点关注提供可视化的工具以及附带的分析技术。通过本文,我们提供了一些参考文献,也为那些将从事双聚类研究的人提供了一个简短的回顾。本文概述如下:我们首先给出了可视化和分析方法,然后对每个提出的工具进行了可视化贡献和分析选项的评估,最后讨论了未来的聚类方向,并提出了未来工作的标准。
{"title":"A Review on Analysis and Visualization Methods for Biclustering","authors":"Melih Sozdinler","doi":"arxiv-2111.12154","DOIUrl":"https://doi.org/arxiv-2111.12154","url":null,"abstract":"Recently, biclustering is one of the hot topics in bioinformatics and takes\u0000the attention of authors from several different disciplines. Hence, many\u0000different methodologies from a variety of disciplines are proposed as a\u0000solution to the biclustering problem. As a consequence of this issue, a variety\u0000of solutions makes it harder to evaluate the proposed methods. With this review\u0000paper, we are aimed to discuss both analysis and visualization of biclustering\u0000as a guide for the comparisons between brand new and existing biclustering\u0000algorithms. Additionally, we concentrate on the tools that provide\u0000visualizations with accompanied analysis techniques. Through the paper, we give\u0000several references that are also a short review of the state of the art for the\u0000ones who will pursue research on biclustering. The Paper outline is as follows;\u0000we first give the visualization and analysis methods, then we evaluate each\u0000proposed tool with the visualization contribution and analysis options,\u0000finally, we discuss future directions for biclustering and we propose standards\u0000for future work.","PeriodicalId":501533,"journal":{"name":"arXiv - CS - General Literature","volume":"64 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-11-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138544304","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Datasets for Online Controlled Experiments 在线控制实验数据集
Pub Date : 2021-11-19 DOI: arxiv-2111.10198
C. H. Bryan Liu, Ângelo Cardoso, Paul Couturier, Emma J. McCoy
Online Controlled Experiments (OCE) are the gold standard to measure impactand guide decisions for digital products and services. Despite manymethodological advances in this area, the scarcity of public datasets and thelack of a systematic review and categorization hinder its development. Wepresent the first survey and taxonomy for OCE datasets, which highlight thelack of a public dataset to support the design and running of experiments withadaptive stopping, an increasingly popular approach to enable quickly deployingimprovements or rolling back degrading changes. We release the first suchdataset, containing daily checkpoints of decision metrics from multiple, realexperiments run on a global e-commerce platform. The dataset design is guidedby a broader discussion on data requirements for common statistical tests usedin digital experimentation. We demonstrate how to use the dataset in theadaptive stopping scenario using sequential and Bayesian hypothesis tests andlearn the relevant parameters for each approach.
在线控制实验(OCE)是衡量数字产品和服务影响和指导决策的黄金标准。尽管在这一领域取得了许多方法上的进步,但缺乏公共数据集和缺乏系统的审查和分类阻碍了它的发展。我们提出了OCE数据集的第一个调查和分类,强调缺乏公共数据集来支持自适应停止实验的设计和运行,这是一种越来越流行的方法,可以快速部署改进或回滚降级更改。我们发布了第一个这样的数据集,包含了在全球电子商务平台上运行的多个真实实验的决策指标的每日检查点。数据集设计的指导是对数字实验中常用统计测试的数据要求进行更广泛的讨论。我们演示了如何在自适应停止场景中使用数据集,使用顺序和贝叶斯假设检验,并学习每种方法的相关参数。
{"title":"Datasets for Online Controlled Experiments","authors":"C. H. Bryan Liu, Ângelo Cardoso, Paul Couturier, Emma J. McCoy","doi":"arxiv-2111.10198","DOIUrl":"https://doi.org/arxiv-2111.10198","url":null,"abstract":"Online Controlled Experiments (OCE) are the gold standard to measure impact\u0000and guide decisions for digital products and services. Despite many\u0000methodological advances in this area, the scarcity of public datasets and the\u0000lack of a systematic review and categorization hinder its development. We\u0000present the first survey and taxonomy for OCE datasets, which highlight the\u0000lack of a public dataset to support the design and running of experiments with\u0000adaptive stopping, an increasingly popular approach to enable quickly deploying\u0000improvements or rolling back degrading changes. We release the first such\u0000dataset, containing daily checkpoints of decision metrics from multiple, real\u0000experiments run on a global e-commerce platform. The dataset design is guided\u0000by a broader discussion on data requirements for common statistical tests used\u0000in digital experimentation. We demonstrate how to use the dataset in the\u0000adaptive stopping scenario using sequential and Bayesian hypothesis tests and\u0000learn the relevant parameters for each approach.","PeriodicalId":501533,"journal":{"name":"arXiv - CS - General Literature","volume":"5 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138544303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
State of the Art of Augmented Reality (AR) Capabilities for Civil Infrastructure Applications 增强现实(AR)功能在民用基础设施应用中的最新进展
Pub Date : 2021-10-17 DOI: arxiv-2110.08698
Jiaqi Xu, Derek Doyle, Fernando Moreu
Augmented Reality (AR) is a technology superimposing interactional virtualobjects onto a real environment. Since the beginning of the millennium, ARtechnologies have shown rapid growth, with significant research publications inengineering and science. However, the civil infrastructure community hasminimally implemented AR technologies to date. One of the challenges that civilengineers face when understanding and using AR is the lack of a classificationof AR in the context of capabilities for civil infrastructure applications.Practitioners in civil infrastructure, like most engineering fields, prioritizeunderstanding the level of maturity of a new technology before considering itsadoption and field implementation. This paper compares the capabilities ofsixteen AR Head-Mounted Devices (HMDs) available in the market since 2017,ranking them in terms of performance for civil infrastructure implementations.Finally, the authors recommend a development framework for practical ARinterfaces with civil infrastructure and operations.
增强现实(AR)是一种将交互式虚拟对象叠加到真实环境中的技术。自2000年以来,人工智能技术发展迅速,在工程和科学领域发表了大量研究成果。然而,迄今为止,民用基础设施社区很少实施AR技术。土木工程师在理解和使用AR时面临的挑战之一是在民用基础设施应用能力的背景下缺乏AR分类。与大多数工程领域一样,土木基础设施领域的从业者在考虑新技术的采用和现场实施之前,首先要了解新技术的成熟度。本文比较了自2017年以来市场上16种AR头戴式设备(hmd)的功能,并根据民用基础设施实施的性能对它们进行了排名。最后,作者推荐了一个与民用基础设施和操作的实用arinterface的开发框架。
{"title":"State of the Art of Augmented Reality (AR) Capabilities for Civil Infrastructure Applications","authors":"Jiaqi Xu, Derek Doyle, Fernando Moreu","doi":"arxiv-2110.08698","DOIUrl":"https://doi.org/arxiv-2110.08698","url":null,"abstract":"Augmented Reality (AR) is a technology superimposing interactional virtual\u0000objects onto a real environment. Since the beginning of the millennium, AR\u0000technologies have shown rapid growth, with significant research publications in\u0000engineering and science. However, the civil infrastructure community has\u0000minimally implemented AR technologies to date. One of the challenges that civil\u0000engineers face when understanding and using AR is the lack of a classification\u0000of AR in the context of capabilities for civil infrastructure applications.\u0000Practitioners in civil infrastructure, like most engineering fields, prioritize\u0000understanding the level of maturity of a new technology before considering its\u0000adoption and field implementation. This paper compares the capabilities of\u0000sixteen AR Head-Mounted Devices (HMDs) available in the market since 2017,\u0000ranking them in terms of performance for civil infrastructure implementations.\u0000Finally, the authors recommend a development framework for practical AR\u0000interfaces with civil infrastructure and operations.","PeriodicalId":501533,"journal":{"name":"arXiv - CS - General Literature","volume":"61 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138544604","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Towards a Theory of Bullshit Visualization 走向狗屁可视化理论
Pub Date : 2021-09-23 DOI: arxiv-2109.12975
Michael Correll
In this unhinged rant, I lay out my suspicion that a lot of visualizationsare bullshit: charts that do not have even the common decency to intentionallylie but are totally unconcerned about the state of the world or any practicalutility. I suspect that bullshit charts take up a large fraction of the timeand attention of actual visualization producers and consumers, and yet areseemingly absent from academic research into visualization design.
在这篇疯狂的咆哮中,我提出了我的怀疑,认为许多可视化图表都是胡扯:这些图表甚至不具备故意撒谎的一般礼仪,但完全不关心世界的状态或任何实际效用。我怀疑这些扯淡的图表占据了实际可视化制作者和消费者的大部分时间和注意力,但在可视化设计的学术研究中似乎是缺席的。
{"title":"Towards a Theory of Bullshit Visualization","authors":"Michael Correll","doi":"arxiv-2109.12975","DOIUrl":"https://doi.org/arxiv-2109.12975","url":null,"abstract":"In this unhinged rant, I lay out my suspicion that a lot of visualizations\u0000are bullshit: charts that do not have even the common decency to intentionally\u0000lie but are totally unconcerned about the state of the world or any practical\u0000utility. I suspect that bullshit charts take up a large fraction of the time\u0000and attention of actual visualization producers and consumers, and yet are\u0000seemingly absent from academic research into visualization design.","PeriodicalId":501533,"journal":{"name":"arXiv - CS - General Literature","volume":"38 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138544302","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Towards the Classification of Error-Related Potentials using Riemannian Geometry 用黎曼几何方法对误差相关电位进行分类
Pub Date : 2021-09-21 DOI: arxiv-2109.13085
Yichen Tang, Jerry J. Zhang, Paul M. Corballis, Luke E. Hallum
The error-related potential (ErrP) is an event-related potential (ERP) evokedby an experimental participant's recognition of an error during taskperformance. ErrPs, originally described by cognitive psychologists, have beenadopted for use in brain-computer interfaces (BCIs) for the detection andcorrection of errors, and the online refinement of decoding algorithms.Riemannian geometry-based feature extraction and classification is a newapproach to BCI which shows good performance in a range of experimentalparadigms, but has yet to be applied to the classification of ErrPs. Here, wedescribe an experiment that elicited ErrPs in seven normal participantsperforming a visual discrimination task. Audio feedback was provided on eachtrial. We used multi-channel electroencephalogram (EEG) recordings to classifyErrPs (success/failure), comparing a Riemannian geometry-based method to atraditional approach that computes time-point features. Overall, the Riemannianapproach outperformed the traditional approach (78.2% versus 75.9% accuracy, p< 0.05); this difference was statistically significant (p < 0.05) in three ofseven participants. These results indicate that the Riemannian approach bettercaptured the features from feedback-elicited ErrPs, and may have application inBCI for error detection and correction.
错误相关电位(ErrP)是由实验参与者在任务执行过程中对错误的认识而引起的事件相关电位(ERP)。errp最初由认知心理学家描述,已被用于脑机接口(bci),用于检测和纠正错误,以及在线改进解码算法。基于黎曼几何的特征提取和分类是一种新的脑机接口方法,在一系列实验范式中表现出良好的性能,但尚未应用于errp的分类。在这里,我们描述了一个实验,在7名正常参与者执行视觉辨别任务时引发errp。每次试验都提供了音频反馈。我们使用多通道脑电图(EEG)记录对errp(成功/失败)进行分类,并将基于黎曼几何的方法与计算时间点特征的传统方法进行比较。总体而言,riemannanmethod优于传统方法(准确率78.2% vs 75.9%, p< 0.05);这一差异在7名参与者中有3名具有统计学意义(p < 0.05)。这些结果表明,黎曼方法可以更好地捕获反馈引发的errp的特征,并且可以在脑机接口中应用于错误检测和纠正。
{"title":"Towards the Classification of Error-Related Potentials using Riemannian Geometry","authors":"Yichen Tang, Jerry J. Zhang, Paul M. Corballis, Luke E. Hallum","doi":"arxiv-2109.13085","DOIUrl":"https://doi.org/arxiv-2109.13085","url":null,"abstract":"The error-related potential (ErrP) is an event-related potential (ERP) evoked\u0000by an experimental participant's recognition of an error during task\u0000performance. ErrPs, originally described by cognitive psychologists, have been\u0000adopted for use in brain-computer interfaces (BCIs) for the detection and\u0000correction of errors, and the online refinement of decoding algorithms.\u0000Riemannian geometry-based feature extraction and classification is a new\u0000approach to BCI which shows good performance in a range of experimental\u0000paradigms, but has yet to be applied to the classification of ErrPs. Here, we\u0000describe an experiment that elicited ErrPs in seven normal participants\u0000performing a visual discrimination task. Audio feedback was provided on each\u0000trial. We used multi-channel electroencephalogram (EEG) recordings to classify\u0000ErrPs (success/failure), comparing a Riemannian geometry-based method to a\u0000traditional approach that computes time-point features. Overall, the Riemannian\u0000approach outperformed the traditional approach (78.2% versus 75.9% accuracy, p\u0000< 0.05); this difference was statistically significant (p < 0.05) in three of\u0000seven participants. These results indicate that the Riemannian approach better\u0000captured the features from feedback-elicited ErrPs, and may have application in\u0000BCI for error detection and correction.","PeriodicalId":501533,"journal":{"name":"arXiv - CS - General Literature","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138544606","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
arXiv - CS - General Literature
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1