In recent years, the data science community has pursued excellence and made significant research efforts to develop advanced analytics, focusing on solving technical problems at the expense of organizational and socio-technical challenges. According to previous surveys on the state of data science project management, there is a significant gap between technical and organizational processes. In this article we present new empirical data from a survey to 237 data science professionals on the use of project management methodologies for data science. We provide additional profiling of the survey respondents' roles and their priorities when executing data science projects. Based on this survey study, the main findings are: (1) Agile data science lifecycle is the most widely used framework, but only 25% of the survey participants state to follow a data science project methodology. (2) The most important success factors are precisely describing stakeholders' needs, communicating the results to end-users, and team collaboration and coordination. (3) Professionals who adhere to a project methodology place greater emphasis on the project's potential risks and pitfalls, version control, the deployment pipeline to production, and data security and privacy.
{"title":"A survey study of success factors in data science projects","authors":"Iñigo Martinez, Elisabeth Viles, Igor G. Olaizola","doi":"arxiv-2201.06310","DOIUrl":"https://doi.org/arxiv-2201.06310","url":null,"abstract":"In recent years, the data science community has pursued excellence and made\u0000significant research efforts to develop advanced analytics, focusing on solving\u0000technical problems at the expense of organizational and socio-technical\u0000challenges. According to previous surveys on the state of data science project\u0000management, there is a significant gap between technical and organizational\u0000processes. In this article we present new empirical data from a survey to 237\u0000data science professionals on the use of project management methodologies for\u0000data science. We provide additional profiling of the survey respondents' roles\u0000and their priorities when executing data science projects. Based on this survey\u0000study, the main findings are: (1) Agile data science lifecycle is the most\u0000widely used framework, but only 25% of the survey participants state to follow\u0000a data science project methodology. (2) The most important success factors are\u0000precisely describing stakeholders' needs, communicating the results to\u0000end-users, and team collaboration and coordination. (3) Professionals who\u0000adhere to a project methodology place greater emphasis on the project's\u0000potential risks and pitfalls, version control, the deployment pipeline to\u0000production, and data security and privacy.","PeriodicalId":501533,"journal":{"name":"arXiv - CS - General Literature","volume":"9 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138544608","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Data and Science has stood out in the generation of results, whether in the projects of the scientific domain or business domain. CERN Project, Scientific Institutes, companies like Walmart, Google, Apple, among others, need data to present their results and make predictions in the competitive data world. Data and Science are words that together culminated in a globally recognized term called Data Science. Data Science is in its initial phase, possibly being part of formal sciences and also being presented as part of applied sciences, capable of generating value and supporting decision making. Data Science considers science and, consequently, the scientific method to promote decision making through data intelligence. In many cases, the application of the method (or part of it) is considered in Data Science projects in scientific domain (social sciences, bioinformatics, geospatial projects) or business domain (finance, logistic, retail), among others. In this sense, this article addresses the perspectives of Data Science as a multidisciplinary area, considering science and the scientific method, and its formal structure which integrate Statistics, Computer Science, and Business Science, also taking into account Artificial Intelligence, emphasizing Machine Learning, among others. The article also deals with the perspective of applied Data Science, since Data Science is used for generating value through scientific and business projects. Data Science persona is also discussed in the article, concerning the education of Data Science professionals and its corresponding profiles, since its projection changes the field of data in the world.
无论是在科学领域还是在商业领域的项目中,Data and Science都在成果的产生中脱颖而出。欧洲核子研究中心项目、科学研究所、沃尔玛、谷歌、苹果等公司都需要数据来展示他们的结果,并在竞争激烈的数据世界中做出预测。“数据”和“科学”这两个词结合在一起,形成了一个全球公认的术语——“数据科学”。数据科学正处于初始阶段,可能是正式科学的一部分,也可能是应用科学的一部分,能够产生价值并支持决策。数据科学考虑科学,因此,通过数据智能促进决策的科学方法。在许多情况下,在科学领域(社会科学,生物信息学,地理空间项目)或商业领域(金融,物流,零售)等数据科学项目中考虑该方法(或其部分)的应用。从这个意义上说,本文将数据科学的观点视为一个多学科领域,考虑到科学和科学方法,以及整合统计学,计算机科学和商业科学的正式结构,也考虑到人工智能,强调机器学习等。本文还讨论了应用数据科学的观点,因为数据科学用于通过科学和商业项目创造价值。本文还讨论了数据科学角色,涉及数据科学专业人员的教育及其相应的概况,因为它的投影改变了世界上的数据领域。
{"title":"Data Science in Perspective","authors":"Rogerio Rossi","doi":"arxiv-2201.05852","DOIUrl":"https://doi.org/arxiv-2201.05852","url":null,"abstract":"Data and Science has stood out in the generation of results, whether in the\u0000projects of the scientific domain or business domain. CERN Project, Scientific\u0000Institutes, companies like Walmart, Google, Apple, among others, need data to\u0000present their results and make predictions in the competitive data world. Data\u0000and Science are words that together culminated in a globally recognized term\u0000called Data Science. Data Science is in its initial phase, possibly being part\u0000of formal sciences and also being presented as part of applied sciences,\u0000capable of generating value and supporting decision making. Data Science\u0000considers science and, consequently, the scientific method to promote decision\u0000making through data intelligence. In many cases, the application of the method\u0000(or part of it) is considered in Data Science projects in scientific domain\u0000(social sciences, bioinformatics, geospatial projects) or business domain\u0000(finance, logistic, retail), among others. In this sense, this article\u0000addresses the perspectives of Data Science as a multidisciplinary area,\u0000considering science and the scientific method, and its formal structure which\u0000integrate Statistics, Computer Science, and Business Science, also taking into\u0000account Artificial Intelligence, emphasizing Machine Learning, among others.\u0000The article also deals with the perspective of applied Data Science, since Data\u0000Science is used for generating value through scientific and business projects.\u0000Data Science persona is also discussed in the article, concerning the education\u0000of Data Science professionals and its corresponding profiles, since its\u0000projection changes the field of data in the world.","PeriodicalId":501533,"journal":{"name":"arXiv - CS - General Literature","volume":"114 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138544611","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The electrical generation and transmission infrastructures of many countries are under increased pressure. This partially reflects the move towards low carbon economies and the increased reliance on renewable power generation systems. There has been a reduction in the use of traditional fossil fuel generation systems, which provide a stable base load, and this has been replaced with more unpredictable renewable generation. As a consequence, the available load on the grid is becoming more unstable. To cope with this variability, the UK National Grid has placed emphasis on the investigation of various technical mechanisms (e.g. implementation of smart grids, energy storage technologies, auxiliary power sources), which may be able to prevent critical situations, when the grid may become sometimes unstable. The successful implementation of these mechanisms may require large numbers of electrical consumers (e.g. HVAC systems, food refrigeration systems) for example to make additional investments in energy storage technologies (food refrigeration systems) or to integrate their electrical demand from industrial processes into the National Grid (HVAC systems). However, in the situation of food refrigeration systems, during these critical situations, even if the thermal inertia within refrigeration systems may maintain effective performance of the device for a short period of time (e.g. under 1 minute) when the electrical input load into the system is reduced, this still carries the paramount risk of food safety even for very short periods of time (e.g. under 1 minute). Therefore before considering any future actions (e.g. investing in energy storage technologies) to prevent the critical situations when grid becomes unstable, it is also needed to understand during the normal use how the temperature profiles evolve along the time inside these massive networks of food refrigeration systems.
{"title":"Data science to investigate temperature profiles of large networks of food refrigeration systems","authors":"Corneliu Arsene","doi":"arxiv-2201.02046","DOIUrl":"https://doi.org/arxiv-2201.02046","url":null,"abstract":"The electrical generation and transmission infrastructures of many countries\u0000are under increased pressure. This partially reflects the move towards low\u0000carbon economies and the increased reliance on renewable power generation\u0000systems. There has been a reduction in the use of traditional fossil fuel\u0000generation systems, which provide a stable base load, and this has been\u0000replaced with more unpredictable renewable generation. As a consequence, the\u0000available load on the grid is becoming more unstable. To cope with this\u0000variability, the UK National Grid has placed emphasis on the investigation of\u0000various technical mechanisms (e.g. implementation of smart grids, energy\u0000storage technologies, auxiliary power sources), which may be able to prevent\u0000critical situations, when the grid may become sometimes unstable. The\u0000successful implementation of these mechanisms may require large numbers of\u0000electrical consumers (e.g. HVAC systems, food refrigeration systems) for\u0000example to make additional investments in energy storage technologies (food\u0000refrigeration systems) or to integrate their electrical demand from industrial\u0000processes into the National Grid (HVAC systems). However, in the situation of\u0000food refrigeration systems, during these critical situations, even if the\u0000thermal inertia within refrigeration systems may maintain effective performance\u0000of the device for a short period of time (e.g. under 1 minute) when the\u0000electrical input load into the system is reduced, this still carries the\u0000paramount risk of food safety even for very short periods of time (e.g. under 1\u0000minute). Therefore before considering any future actions (e.g. investing in\u0000energy storage technologies) to prevent the critical situations when grid\u0000becomes unstable, it is also needed to understand during the normal use how the\u0000temperature profiles evolve along the time inside these massive networks of\u0000food refrigeration systems.","PeriodicalId":501533,"journal":{"name":"arXiv - CS - General Literature","volume":"29 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138544610","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Harmonic numbers arise from the truncation of the harmonic series. The $n^text{th}$ harmonic number is the sum of the reciprocals of each positive integer up to $n$. In addition to briefly introducing the properties of harmonic numbers, we cover harmonic numbers as the summation of integrals that involve the product of exponential and hyperbolic secant functions. The proof is relatively simple since it only comprises the Principle of Mathematical Induction and integration by parts.
{"title":"Harmonic numbers as the summation of integrals","authors":"N. Karjanto","doi":"arxiv-2112.00257","DOIUrl":"https://doi.org/arxiv-2112.00257","url":null,"abstract":"Harmonic numbers arise from the truncation of the harmonic series. The\u0000$n^text{th}$ harmonic number is the sum of the reciprocals of each positive\u0000integer up to $n$. In addition to briefly introducing the properties of\u0000harmonic numbers, we cover harmonic numbers as the summation of integrals that\u0000involve the product of exponential and hyperbolic secant functions. The proof\u0000is relatively simple since it only comprises the Principle of Mathematical\u0000Induction and integration by parts.","PeriodicalId":501533,"journal":{"name":"arXiv - CS - General Literature","volume":"2 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138544605","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Thorsten Wittkopp, Philipp Wiesner, Dominik Scheinert, Odej Kao
Log data anomaly detection is a core component in the area of artificial intelligence for IT operations. However, the large amount of existing methods makes it hard to choose the right approach for a specific system. A better understanding of different kinds of anomalies, and which algorithms are suitable for detecting them, would support researchers and IT operators. Although a common taxonomy for anomalies already exists, it has not yet been applied specifically to log data, pointing out the characteristics and peculiarities in this domain. In this paper, we present a taxonomy for different kinds of log data anomalies and introduce a method for analyzing such anomalies in labeled datasets. We applied our taxonomy to the three common benchmark datasets Thunderbird, Spirit, and BGL, and trained five state-of-the-art unsupervised anomaly detection algorithms to evaluate their performance in detecting different kinds of anomalies. Our results show, that the most common anomaly type is also the easiest to predict. Moreover, deep learning-based approaches outperform data mining-based approaches in all anomaly types, but especially when it comes to detecting contextual anomalies.
{"title":"A Taxonomy of Anomalies in Log Data","authors":"Thorsten Wittkopp, Philipp Wiesner, Dominik Scheinert, Odej Kao","doi":"arxiv-2111.13462","DOIUrl":"https://doi.org/arxiv-2111.13462","url":null,"abstract":"Log data anomaly detection is a core component in the area of artificial\u0000intelligence for IT operations. However, the large amount of existing methods\u0000makes it hard to choose the right approach for a specific system. A better\u0000understanding of different kinds of anomalies, and which algorithms are\u0000suitable for detecting them, would support researchers and IT operators.\u0000Although a common taxonomy for anomalies already exists, it has not yet been\u0000applied specifically to log data, pointing out the characteristics and\u0000peculiarities in this domain. In this paper, we present a taxonomy for different kinds of log data\u0000anomalies and introduce a method for analyzing such anomalies in labeled\u0000datasets. We applied our taxonomy to the three common benchmark datasets\u0000Thunderbird, Spirit, and BGL, and trained five state-of-the-art unsupervised\u0000anomaly detection algorithms to evaluate their performance in detecting\u0000different kinds of anomalies. Our results show, that the most common anomaly\u0000type is also the easiest to predict. Moreover, deep learning-based approaches\u0000outperform data mining-based approaches in all anomaly types, but especially\u0000when it comes to detecting contextual anomalies.","PeriodicalId":501533,"journal":{"name":"arXiv - CS - General Literature","volume":"13 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138544607","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Recently, biclustering is one of the hot topics in bioinformatics and takes the attention of authors from several different disciplines. Hence, many different methodologies from a variety of disciplines are proposed as a solution to the biclustering problem. As a consequence of this issue, a variety of solutions makes it harder to evaluate the proposed methods. With this review paper, we are aimed to discuss both analysis and visualization of biclustering as a guide for the comparisons between brand new and existing biclustering algorithms. Additionally, we concentrate on the tools that provide visualizations with accompanied analysis techniques. Through the paper, we give several references that are also a short review of the state of the art for the ones who will pursue research on biclustering. The Paper outline is as follows; we first give the visualization and analysis methods, then we evaluate each proposed tool with the visualization contribution and analysis options, finally, we discuss future directions for biclustering and we propose standards for future work.
{"title":"A Review on Analysis and Visualization Methods for Biclustering","authors":"Melih Sozdinler","doi":"arxiv-2111.12154","DOIUrl":"https://doi.org/arxiv-2111.12154","url":null,"abstract":"Recently, biclustering is one of the hot topics in bioinformatics and takes\u0000the attention of authors from several different disciplines. Hence, many\u0000different methodologies from a variety of disciplines are proposed as a\u0000solution to the biclustering problem. As a consequence of this issue, a variety\u0000of solutions makes it harder to evaluate the proposed methods. With this review\u0000paper, we are aimed to discuss both analysis and visualization of biclustering\u0000as a guide for the comparisons between brand new and existing biclustering\u0000algorithms. Additionally, we concentrate on the tools that provide\u0000visualizations with accompanied analysis techniques. Through the paper, we give\u0000several references that are also a short review of the state of the art for the\u0000ones who will pursue research on biclustering. The Paper outline is as follows;\u0000we first give the visualization and analysis methods, then we evaluate each\u0000proposed tool with the visualization contribution and analysis options,\u0000finally, we discuss future directions for biclustering and we propose standards\u0000for future work.","PeriodicalId":501533,"journal":{"name":"arXiv - CS - General Literature","volume":"64 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-11-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138544304","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
C. H. Bryan Liu, Ângelo Cardoso, Paul Couturier, Emma J. McCoy
Online Controlled Experiments (OCE) are the gold standard to measure impact and guide decisions for digital products and services. Despite many methodological advances in this area, the scarcity of public datasets and the lack of a systematic review and categorization hinder its development. We present the first survey and taxonomy for OCE datasets, which highlight the lack of a public dataset to support the design and running of experiments with adaptive stopping, an increasingly popular approach to enable quickly deploying improvements or rolling back degrading changes. We release the first such dataset, containing daily checkpoints of decision metrics from multiple, real experiments run on a global e-commerce platform. The dataset design is guided by a broader discussion on data requirements for common statistical tests used in digital experimentation. We demonstrate how to use the dataset in the adaptive stopping scenario using sequential and Bayesian hypothesis tests and learn the relevant parameters for each approach.
{"title":"Datasets for Online Controlled Experiments","authors":"C. H. Bryan Liu, Ângelo Cardoso, Paul Couturier, Emma J. McCoy","doi":"arxiv-2111.10198","DOIUrl":"https://doi.org/arxiv-2111.10198","url":null,"abstract":"Online Controlled Experiments (OCE) are the gold standard to measure impact\u0000and guide decisions for digital products and services. Despite many\u0000methodological advances in this area, the scarcity of public datasets and the\u0000lack of a systematic review and categorization hinder its development. We\u0000present the first survey and taxonomy for OCE datasets, which highlight the\u0000lack of a public dataset to support the design and running of experiments with\u0000adaptive stopping, an increasingly popular approach to enable quickly deploying\u0000improvements or rolling back degrading changes. We release the first such\u0000dataset, containing daily checkpoints of decision metrics from multiple, real\u0000experiments run on a global e-commerce platform. The dataset design is guided\u0000by a broader discussion on data requirements for common statistical tests used\u0000in digital experimentation. We demonstrate how to use the dataset in the\u0000adaptive stopping scenario using sequential and Bayesian hypothesis tests and\u0000learn the relevant parameters for each approach.","PeriodicalId":501533,"journal":{"name":"arXiv - CS - General Literature","volume":"5 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138544303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Augmented Reality (AR) is a technology superimposing interactional virtual objects onto a real environment. Since the beginning of the millennium, AR technologies have shown rapid growth, with significant research publications in engineering and science. However, the civil infrastructure community has minimally implemented AR technologies to date. One of the challenges that civil engineers face when understanding and using AR is the lack of a classification of AR in the context of capabilities for civil infrastructure applications. Practitioners in civil infrastructure, like most engineering fields, prioritize understanding the level of maturity of a new technology before considering its adoption and field implementation. This paper compares the capabilities of sixteen AR Head-Mounted Devices (HMDs) available in the market since 2017, ranking them in terms of performance for civil infrastructure implementations. Finally, the authors recommend a development framework for practical AR interfaces with civil infrastructure and operations.
{"title":"State of the Art of Augmented Reality (AR) Capabilities for Civil Infrastructure Applications","authors":"Jiaqi Xu, Derek Doyle, Fernando Moreu","doi":"arxiv-2110.08698","DOIUrl":"https://doi.org/arxiv-2110.08698","url":null,"abstract":"Augmented Reality (AR) is a technology superimposing interactional virtual\u0000objects onto a real environment. Since the beginning of the millennium, AR\u0000technologies have shown rapid growth, with significant research publications in\u0000engineering and science. However, the civil infrastructure community has\u0000minimally implemented AR technologies to date. One of the challenges that civil\u0000engineers face when understanding and using AR is the lack of a classification\u0000of AR in the context of capabilities for civil infrastructure applications.\u0000Practitioners in civil infrastructure, like most engineering fields, prioritize\u0000understanding the level of maturity of a new technology before considering its\u0000adoption and field implementation. This paper compares the capabilities of\u0000sixteen AR Head-Mounted Devices (HMDs) available in the market since 2017,\u0000ranking them in terms of performance for civil infrastructure implementations.\u0000Finally, the authors recommend a development framework for practical AR\u0000interfaces with civil infrastructure and operations.","PeriodicalId":501533,"journal":{"name":"arXiv - CS - General Literature","volume":"61 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138544604","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this unhinged rant, I lay out my suspicion that a lot of visualizations are bullshit: charts that do not have even the common decency to intentionally lie but are totally unconcerned about the state of the world or any practical utility. I suspect that bullshit charts take up a large fraction of the time and attention of actual visualization producers and consumers, and yet are seemingly absent from academic research into visualization design.
{"title":"Towards a Theory of Bullshit Visualization","authors":"Michael Correll","doi":"arxiv-2109.12975","DOIUrl":"https://doi.org/arxiv-2109.12975","url":null,"abstract":"In this unhinged rant, I lay out my suspicion that a lot of visualizations\u0000are bullshit: charts that do not have even the common decency to intentionally\u0000lie but are totally unconcerned about the state of the world or any practical\u0000utility. I suspect that bullshit charts take up a large fraction of the time\u0000and attention of actual visualization producers and consumers, and yet are\u0000seemingly absent from academic research into visualization design.","PeriodicalId":501533,"journal":{"name":"arXiv - CS - General Literature","volume":"38 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138544302","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yichen Tang, Jerry J. Zhang, Paul M. Corballis, Luke E. Hallum
The error-related potential (ErrP) is an event-related potential (ERP) evoked by an experimental participant's recognition of an error during task performance. ErrPs, originally described by cognitive psychologists, have been adopted for use in brain-computer interfaces (BCIs) for the detection and correction of errors, and the online refinement of decoding algorithms. Riemannian geometry-based feature extraction and classification is a new approach to BCI which shows good performance in a range of experimental paradigms, but has yet to be applied to the classification of ErrPs. Here, we describe an experiment that elicited ErrPs in seven normal participants performing a visual discrimination task. Audio feedback was provided on each trial. We used multi-channel electroencephalogram (EEG) recordings to classify ErrPs (success/failure), comparing a Riemannian geometry-based method to a traditional approach that computes time-point features. Overall, the Riemannian approach outperformed the traditional approach (78.2% versus 75.9% accuracy, p < 0.05); this difference was statistically significant (p < 0.05) in three of seven participants. These results indicate that the Riemannian approach better captured the features from feedback-elicited ErrPs, and may have application in BCI for error detection and correction.
错误相关电位(ErrP)是由实验参与者在任务执行过程中对错误的认识而引起的事件相关电位(ERP)。errp最初由认知心理学家描述,已被用于脑机接口(bci),用于检测和纠正错误,以及在线改进解码算法。基于黎曼几何的特征提取和分类是一种新的脑机接口方法,在一系列实验范式中表现出良好的性能,但尚未应用于errp的分类。在这里,我们描述了一个实验,在7名正常参与者执行视觉辨别任务时引发errp。每次试验都提供了音频反馈。我们使用多通道脑电图(EEG)记录对errp(成功/失败)进行分类,并将基于黎曼几何的方法与计算时间点特征的传统方法进行比较。总体而言,riemannanmethod优于传统方法(准确率78.2% vs 75.9%, p< 0.05);这一差异在7名参与者中有3名具有统计学意义(p < 0.05)。这些结果表明,黎曼方法可以更好地捕获反馈引发的errp的特征,并且可以在脑机接口中应用于错误检测和纠正。
{"title":"Towards the Classification of Error-Related Potentials using Riemannian Geometry","authors":"Yichen Tang, Jerry J. Zhang, Paul M. Corballis, Luke E. Hallum","doi":"arxiv-2109.13085","DOIUrl":"https://doi.org/arxiv-2109.13085","url":null,"abstract":"The error-related potential (ErrP) is an event-related potential (ERP) evoked\u0000by an experimental participant's recognition of an error during task\u0000performance. ErrPs, originally described by cognitive psychologists, have been\u0000adopted for use in brain-computer interfaces (BCIs) for the detection and\u0000correction of errors, and the online refinement of decoding algorithms.\u0000Riemannian geometry-based feature extraction and classification is a new\u0000approach to BCI which shows good performance in a range of experimental\u0000paradigms, but has yet to be applied to the classification of ErrPs. Here, we\u0000describe an experiment that elicited ErrPs in seven normal participants\u0000performing a visual discrimination task. Audio feedback was provided on each\u0000trial. We used multi-channel electroencephalogram (EEG) recordings to classify\u0000ErrPs (success/failure), comparing a Riemannian geometry-based method to a\u0000traditional approach that computes time-point features. Overall, the Riemannian\u0000approach outperformed the traditional approach (78.2% versus 75.9% accuracy, p\u0000< 0.05); this difference was statistically significant (p < 0.05) in three of\u0000seven participants. These results indicate that the Riemannian approach better\u0000captured the features from feedback-elicited ErrPs, and may have application in\u0000BCI for error detection and correction.","PeriodicalId":501533,"journal":{"name":"arXiv - CS - General Literature","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138544606","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}