2019 IEEE International Conference on Data Science and Advanced Analytics (DSAA)最新文献

英文中文

Constrained Multi-Objective Optimization for Automated Machine Learning 自动化机器学习的约束多目标优化

2019 IEEE International Conference on Data Science and Advanced Analytics (DSAA)

Pub Date : 2019-08-14 DOI: 10.1109/DSAA.2019.00051

Steven Gardner, Oleg Golovidov, J. Griffin, P. Koch, W. Thompson, B. Wujek, Yan Xu

Automated machine learning has gained a lot of attention recently. Building and selecting the right machine learning models is often a multi-objective optimization problem. General purpose machine learning software that simultaneously supports multiple objectives and constraints is scant, though the potential benefits are great. In this work, we present a framework called Autotune that effectively handles multiple objectives and constraints that arise in machine learning problems. Autotune is built on a suite of derivative-free optimization methods, and utilizes multi-level parallelism in a distributed computing environment for automatically training, scoring, and selecting good models. Incorporation of multiple objectives and constraints in the model exploration and selection process provides the flexibility needed to satisfy trade-offs necessary in practical machine learning applications. Experimental results from standard multi-objective optimization benchmark problems show that Autotune is very efficient in capturing Pareto fronts. These benchmark results also show how adding constraints can guide the search to more promising regions of the solution space, ultimately producing more desirable Pareto fronts. Results from two real-world case studies demonstrate the effectiveness of the constrained multi-objective optimization capability offered by Autotune.

自动化机器学习最近获得了很多关注。构建和选择正确的机器学习模型通常是一个多目标优化问题。同时支持多个目标和约束的通用机器学习软件很少，尽管潜在的好处是巨大的。在这项工作中，我们提出了一个名为Autotune的框架，可以有效地处理机器学习问题中出现的多个目标和约束。Autotune建立在一套无衍生优化方法之上，并利用分布式计算环境中的多级并行性来自动训练、评分和选择良好的模型。在模型探索和选择过程中结合多个目标和约束，提供了满足实际机器学习应用中必要的权衡所需的灵活性。标准多目标优化基准问题的实验结果表明，Autotune在捕获帕累托前沿方面是非常有效的。这些基准测试结果还显示了添加约束如何引导搜索到解决方案空间中更有希望的区域，最终产生更理想的帕累托前沿。两个实际案例研究的结果证明了Autotune提供的约束多目标优化能力的有效性。

{"title":"Constrained Multi-Objective Optimization for Automated Machine Learning","authors":"Steven Gardner, Oleg Golovidov, J. Griffin, P. Koch, W. Thompson, B. Wujek, Yan Xu","doi":"10.1109/DSAA.2019.00051","DOIUrl":"https://doi.org/10.1109/DSAA.2019.00051","url":null,"abstract":"Automated machine learning has gained a lot of attention recently. Building and selecting the right machine learning models is often a multi-objective optimization problem. General purpose machine learning software that simultaneously supports multiple objectives and constraints is scant, though the potential benefits are great. In this work, we present a framework called Autotune that effectively handles multiple objectives and constraints that arise in machine learning problems. Autotune is built on a suite of derivative-free optimization methods, and utilizes multi-level parallelism in a distributed computing environment for automatically training, scoring, and selecting good models. Incorporation of multiple objectives and constraints in the model exploration and selection process provides the flexibility needed to satisfy trade-offs necessary in practical machine learning applications. Experimental results from standard multi-objective optimization benchmark problems show that Autotune is very efficient in capturing Pareto fronts. These benchmark results also show how adding constraints can guide the search to more promising regions of the solution space, ultimately producing more desirable Pareto fronts. Results from two real-world case studies demonstrate the effectiveness of the constrained multi-objective optimization capability offered by Autotune.","PeriodicalId":416037,"journal":{"name":"2019 IEEE International Conference on Data Science and Advanced Analytics (DSAA)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122501158","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

Uplift Modeling for Multiple Treatments with Cost Optimization 具有成本优化的多处理隆升模型

2019 IEEE International Conference on Data Science and Advanced Analytics (DSAA)

Pub Date : 2019-08-14 DOI: 10.1109/dsaa.2019.00057

Zhenyu Zhao, Totte Harinen

Uplift modeling is an emerging machine learning approach for estimating the treatment effect at an individual or subgroup level. It can be used for optimizing the performance of interventions such as marketing campaigns and product designs. Uplift modeling can be used to estimate which users are likely to benefit from a treatment and then prioritize delivering or promoting the preferred experience to those users. An important but so far neglected use case for uplift modeling is an experiment with multiple treatment groups that have different costs, such as for example when different communication channels and promotion types are tested simultaneously. In this paper, we extend standard uplift models to support multiple treatment groups with different costs. We evaluate the performance of the proposed models using both synthetic and real data. We also describe a production implementation of the approach.

提升模型是一种新兴的机器学习方法，用于估计个体或子群体水平的治疗效果。它可用于优化营销活动和产品设计等干预措施的绩效。提升模型可以用来估计哪些用户可能从治疗中受益，然后优先向这些用户提供或推广首选体验。提升建模的一个重要但迄今为止被忽视的用例是具有不同成本的多个治疗组的实验，例如当同时测试不同的通信渠道和促销类型时。在本文中，我们扩展了标准的抬升模型，以支持不同成本的多个治疗组。我们使用合成数据和真实数据来评估所提出模型的性能。我们还描述了该方法的生产实现。

引用次数: 36

Lightweight and Scalable Particle Tracking and Motion Clustering of 3D Cell Trajectories 轻量级和可扩展的粒子跟踪和3D细胞轨迹的运动聚类

2019 IEEE International Conference on Data Science and Advanced Analytics (DSAA)

Pub Date : 2019-08-10 DOI: 10.1109/DSAA.2019.00056

Mojtaba Sedigh Fazli, Rachel V. Stadler, BahaaEddin AlAila, S. Vella, S. Moreno, G. Ward, Shannon P. Quinn

Tracking cell particles in 3D microscopy videos is a challenging task but is of great significance for modeling the motion of cells. Proper characterization of the cell's shape, evolution, and their movement over time is crucial to understanding and modeling the mechanobiology of cell migration in many diseases. One in particular, toxoplasmosis is the disease caused by the parasite Toxoplasma gondii. Roughly, one-third of the world's population tests positive for T. gondii. Its virulence is linked to its lytic cycle, predicated on its motility and ability to enter and exit nucleated cells; therefore, studies elucidating its motility patterns are critical to the eventual development of therapeutic strategies. Here, we present a computational framework for fast and scalable detection, tracking, and identification of T. gondii motion phenotypes in 3D videos, in a completely unsupervised fashion. Our pipeline consists of several different modules including preprocessing, sparsification, cell detection, cell tracking, trajectories extraction, parametrization of the trajectories; and finally, a clustering step. Additionally, we identified the computational bottlenecks, and developed a lightweight and highly scalable pipeline through a combination of task distribution and parallelism. Our results prove both the accuracy and performance of our method.

在三维显微镜视频中跟踪细胞粒子是一项具有挑战性的任务，但对细胞运动建模具有重要意义。正确描述细胞的形状、进化及其随时间的运动对于理解和模拟许多疾病中细胞迁移的机械生物学至关重要。特别是，弓形虫病是由弓形虫寄生虫引起的疾病。大约有三分之一的世界人口弓形虫检测呈阳性。它的毒性与它的分解周期有关，这取决于它的运动性和进入和离开有核细胞的能力;因此，阐明其运动模式的研究对治疗策略的最终发展至关重要。在这里，我们提出了一个计算框架，用于以完全无监督的方式快速和可扩展地检测、跟踪和识别3D视频中的弓形虫运动表型。我们的流水线由几个不同的模块组成，包括预处理、稀疏化、细胞检测、细胞跟踪、轨迹提取、轨迹参数化;最后是聚类步骤。此外，我们确定了计算瓶颈，并通过任务分布和并行性的组合开发了一个轻量级和高度可伸缩的管道。实验结果证明了该方法的准确性和性能。

{"title":"Lightweight and Scalable Particle Tracking and Motion Clustering of 3D Cell Trajectories","authors":"Mojtaba Sedigh Fazli, Rachel V. Stadler, BahaaEddin AlAila, S. Vella, S. Moreno, G. Ward, Shannon P. Quinn","doi":"10.1109/DSAA.2019.00056","DOIUrl":"https://doi.org/10.1109/DSAA.2019.00056","url":null,"abstract":"Tracking cell particles in 3D microscopy videos is a challenging task but is of great significance for modeling the motion of cells. Proper characterization of the cell's shape, evolution, and their movement over time is crucial to understanding and modeling the mechanobiology of cell migration in many diseases. One in particular, toxoplasmosis is the disease caused by the parasite Toxoplasma gondii. Roughly, one-third of the world's population tests positive for T. gondii. Its virulence is linked to its lytic cycle, predicated on its motility and ability to enter and exit nucleated cells; therefore, studies elucidating its motility patterns are critical to the eventual development of therapeutic strategies. Here, we present a computational framework for fast and scalable detection, tracking, and identification of T. gondii motion phenotypes in 3D videos, in a completely unsupervised fashion. Our pipeline consists of several different modules including preprocessing, sparsification, cell detection, cell tracking, trajectories extraction, parametrization of the trajectories; and finally, a clustering step. Additionally, we identified the computational bottlenecks, and developed a lightweight and highly scalable pipeline through a combination of task distribution and parallelism. Our results prove both the accuracy and performance of our method.","PeriodicalId":416037,"journal":{"name":"2019 IEEE International Conference on Data Science and Advanced Analytics (DSAA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130334538","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

A Novel Multiple Classifier Generation and Combination Framework Based on Fuzzy Clustering and Individualized Ensemble Construction 基于模糊聚类和个性化集成构建的多分类器生成与组合框架

2019 IEEE International Conference on Data Science and Advanced Analytics (DSAA)

Pub Date : 2019-07-31 DOI: 10.1109/DSAA.2019.00038

Zhenzhu Gao, Maryam Zand, Jianhua Ruan

Multiple classifier system (MCS) has become a successful alternative for improving classification performance. However, studies have shown inconsistent results for different MCSs, and it is often difficult to predict which MCS algorithm works the best on a particular problem. We believe that the two crucial steps of MCS - base classifier generation and multiple classifier combination, need to be designed coordinately to produce robust results. In this work, we show that for different testing instances, better classifiers may be trained from different subdomains of training instances including, for example, neighboring instances of the testing instance, or even instances far away from the testing instance. To utilize this intuition, we propose Individualized Classifier Ensemble (ICE). ICE groups training data into overlapping clusters, builds a classifier for each cluster, and then associates each training instance to the top-performing models while taking into account model types and frequency. In testing, ICE finds the k most similar training instances for a testing instance, then predicts class label of the testing instance by averaging the prediction from models associated with these training instances. Evaluation results on 49 benchmarks show that ICE has a stable improvement on a significant proportion of datasets over existing MCS methods. ICE provides a novel choice of utilizing internal patterns among instances to improve classification, and can be easily combined with various classification models and applied to many application domains.

多分类器系统(MCS)已成为提高分类性能的成功替代方案。然而，研究表明不同MCS的结果不一致，并且通常很难预测哪种MCS算法在特定问题上效果最好。我们认为两个关键步骤——基于MCS的分类器生成和多分类器组合，需要协调设计才能产生鲁棒性结果。在这项工作中，我们表明，对于不同的测试实例，更好的分类器可以从训练实例的不同子域中训练，例如，测试实例的邻近实例，甚至远离测试实例的实例。为了利用这种直觉，我们提出了个性化分类器集成(ICE)。ICE将训练数据分组到重叠的聚类中，为每个聚类构建分类器，然后将每个训练实例与表现最好的模型相关联，同时考虑模型类型和频率。在测试中，ICE为测试实例找到k个最相似的训练实例，然后通过与这些训练实例相关的模型的平均预测来预测测试实例的类标签。49个基准的评估结果表明，与现有的MCS方法相比，ICE在相当大比例的数据集上有稳定的改进。ICE提供了一种利用实例之间的内部模式来改进分类的新颖选择，并且可以很容易地与各种分类模型组合并应用于许多应用程序领域。

{"title":"A Novel Multiple Classifier Generation and Combination Framework Based on Fuzzy Clustering and Individualized Ensemble Construction","authors":"Zhenzhu Gao, Maryam Zand, Jianhua Ruan","doi":"10.1109/DSAA.2019.00038","DOIUrl":"https://doi.org/10.1109/DSAA.2019.00038","url":null,"abstract":"Multiple classifier system (MCS) has become a successful alternative for improving classification performance. However, studies have shown inconsistent results for different MCSs, and it is often difficult to predict which MCS algorithm works the best on a particular problem. We believe that the two crucial steps of MCS - base classifier generation and multiple classifier combination, need to be designed coordinately to produce robust results. In this work, we show that for different testing instances, better classifiers may be trained from different subdomains of training instances including, for example, neighboring instances of the testing instance, or even instances far away from the testing instance. To utilize this intuition, we propose Individualized Classifier Ensemble (ICE). ICE groups training data into overlapping clusters, builds a classifier for each cluster, and then associates each training instance to the top-performing models while taking into account model types and frequency. In testing, ICE finds the k most similar training instances for a testing instance, then predicts class label of the testing instance by averaging the prediction from models associated with these training instances. Evaluation results on 49 benchmarks show that ICE has a stable improvement on a significant proportion of datasets over existing MCS methods. ICE provides a novel choice of utilizing internal patterns among instances to improve classification, and can be easily combined with various classification models and applied to many application domains.","PeriodicalId":416037,"journal":{"name":"2019 IEEE International Conference on Data Science and Advanced Analytics (DSAA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122997637","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Towards Automatic Screening of Typical and Atypical Behaviors in Children With Autism 自闭症儿童典型和非典型行为的自动筛选

2019 IEEE International Conference on Data Science and Advanced Analytics (DSAA)

Pub Date : 2019-07-29 DOI: 10.1109/DSAA.2019.00065

A. Cook, Bappaditya Mandal, Donna Berry, Matthew Johnson

Autism spectrum disorders (ASD) impact the cognitive, social, communicative and behavioral abilities of an individual. The development of new clinical decision support systems is of importance in reducing the delay between presentation of symptoms and an accurate diagnosis. In this work, we contribute a new database consisting of video clips of typical (normal) and atypical (such as hand flapping, spinning or rocking) behaviors, displayed in natural settings, which have been collected from the YouTube video website. We propose a preliminary non-intrusive approach based on skeleton keypoint identification using pretrained deep neural networks on human body video clips to extract features and perform body movement analysis that differentiates typical and atypical behaviors of children. Experimental results on the newly contributed database show that our platform performs best with decision tree as the classifier when compared to other popular methodologies and offers a baseline against which alternate approaches may developed and tested.

自闭症谱系障碍(ASD)影响个体的认知、社会、沟通和行为能力。开发新的临床决策支持系统对于减少出现症状和准确诊断之间的延迟具有重要意义。在这项工作中，我们贡献了一个新的数据库，由典型(正常)和非典型(如拍手、旋转或摇晃)行为的视频片段组成，在自然环境中显示，这些视频片段从YouTube视频网站上收集。我们提出了一种基于骨骼关键点识别的初步非侵入式方法，使用预训练的深度神经网络对人体视频片段进行提取特征并进行身体运动分析，以区分儿童的典型和非典型行为。在新贡献的数据库上的实验结果表明，与其他流行的方法相比，我们的平台以决策树作为分类器表现最好，并提供了一个基线，可供开发和测试替代方法。

引用次数: 12

A Real-Time Iterative Machine Learning Approach for Temperature Profile Prediction in Additive Manufacturing Processes 增材制造过程温度分布预测的实时迭代机器学习方法

2019 IEEE International Conference on Data Science and Advanced Analytics (DSAA)

Pub Date : 2019-07-28 DOI: 10.1109/DSAA.2019.00069

Arindam Paul, M. Mozaffar, Zijiang Yang, W. Liao, A. Choudhary, Jian Cao, Ankit Agrawal

Additive Manufacturing (AM) is a manufacturing paradigm that builds three-dimensional objects from a computer-aided design model by successively adding material layer by layer. AM has become very popular in the past decade due to its utility for fast prototyping such as 3D printing as well as manufacturing functional parts with complex geometries using processes such as laser metal deposition that would be difficult to create using traditional machining. As the process for creating an intricate part for an expensive metal such as Titanium is prohibitive with respect to cost, computational models are used to simulate the behavior of AM processes before the experimental run. However, as the simulations are computationally costly and time-consuming for predicting multiscale multi-physics phenomena in AM, physics-informed data-driven machine-learning systems for predicting the behavior of AM processes are immensely beneficial. Such models accelerate not only multiscale simulation tools but also empower real-time control systems using in-situ data. In this paper, we design and develop essential components of a scientific framework for developing a data-driven model-based real-time control system. Finite element methods are employed for solving time-dependent heat equations and developing the database. The proposed framework uses extremely randomized trees - an ensemble of bagged decision trees as the regression algorithm iteratively using temperatures of prior voxels and laser information as inputs to predict temperatures of subsequent voxels. The models achieve mean absolute percentage errors below 1% for predicting temperature profiles for AM processes. The code is made available for the research community at https://github.com/paularindam/ml-iter-additive.

增材制造(AM)是一种从计算机辅助设计模型出发，逐层逐层添加材料，构建三维物体的制造范式。增材制造在过去十年中变得非常流行，因为它可以用于快速原型制作，如3D打印，以及使用激光金属沉积等工艺制造具有复杂几何形状的功能部件，而这些工艺很难使用传统加工来制造。由于为昂贵的金属(如钛)制造复杂部件的过程在成本方面是令人望而却步的，因此在实验运行之前使用计算模型来模拟AM过程的行为。然而，由于预测AM中的多尺度多物理现象的模拟计算成本高且耗时，因此用于预测AM过程行为的物理信息数据驱动的机器学习系统非常有益。这些模型不仅加速了多尺度仿真工具，而且还增强了使用现场数据的实时控制系统的能力。在本文中，我们设计并开发了用于开发基于数据驱动模型的实时控制系统的科学框架的基本组件。采用有限元法求解时变热方程并建立数据库。所提出的框架使用极端随机化树——一种套袋决策树的集合作为回归算法，迭代地使用先前体素的温度和激光信息作为输入来预测后续体素的温度。在预测增材制造过程的温度分布时，模型的平均绝对百分比误差低于1%。研究社区可以在https://github.com/paularindam/ml-iter-additive上获得该代码。

{"title":"A Real-Time Iterative Machine Learning Approach for Temperature Profile Prediction in Additive Manufacturing Processes","authors":"Arindam Paul, M. Mozaffar, Zijiang Yang, W. Liao, A. Choudhary, Jian Cao, Ankit Agrawal","doi":"10.1109/DSAA.2019.00069","DOIUrl":"https://doi.org/10.1109/DSAA.2019.00069","url":null,"abstract":"Additive Manufacturing (AM) is a manufacturing paradigm that builds three-dimensional objects from a computer-aided design model by successively adding material layer by layer. AM has become very popular in the past decade due to its utility for fast prototyping such as 3D printing as well as manufacturing functional parts with complex geometries using processes such as laser metal deposition that would be difficult to create using traditional machining. As the process for creating an intricate part for an expensive metal such as Titanium is prohibitive with respect to cost, computational models are used to simulate the behavior of AM processes before the experimental run. However, as the simulations are computationally costly and time-consuming for predicting multiscale multi-physics phenomena in AM, physics-informed data-driven machine-learning systems for predicting the behavior of AM processes are immensely beneficial. Such models accelerate not only multiscale simulation tools but also empower real-time control systems using in-situ data. In this paper, we design and develop essential components of a scientific framework for developing a data-driven model-based real-time control system. Finite element methods are employed for solving time-dependent heat equations and developing the database. The proposed framework uses extremely randomized trees - an ensemble of bagged decision trees as the regression algorithm iteratively using temperatures of prior voxels and laser information as inputs to predict temperatures of subsequent voxels. The models achieve mean absolute percentage errors below 1% for predicting temperature profiles for AM processes. The code is made available for the research community at https://github.com/paularindam/ml-iter-additive.","PeriodicalId":416037,"journal":{"name":"2019 IEEE International Conference on Data Science and Advanced Analytics (DSAA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122501252","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 41

Message from the Program Committee Co-Chairs 项目委员会联合主席的信息

2019 IEEE International Conference on Data Science and Advanced Analytics (DSAA)

Pub Date : 2019-07-01 DOI: 10.1109/wdfia.2008.4

Dsaa, J. Yu

Welcome to the 6th IEEE International Conference on Data Science and Advanced Analytics (DSAA’2019), the flagship annual meeting that spans the interdisciplinary fields of Data Science and Advanced Analytics. DSAA brings together researchers, industry and government practitioners, as well as developers and users of data science solutions. This creates a premier forum for an exchange of ideas on the latest theoretical developments in Data Science and on the best practice for a wide range of applications. DSAA focuses on the science of data science, as well as the implications of the science to industry, government, and society. On the science side, DSAA spans all the component fields of data science, including statistics, probabilistic and mathematical modeling, machine learning, data mining and knowledge discovery, complexity science, network science, business analytics, data management, infrastructure and storage, retrieval and search, security, privacy and ethics. On the applications side, DSAA highlights case studies and poses research obstacles motivated by applied work. DSAA showcases applications impacted by data science, presents tools and platforms that enable deployed data science solutions, and exposes researchers to challenges motivated by the application domains. As an alternative to the highly specialized disciplinary conferences, DSAA reflects the interdisciplinary nature of data science and analytics.

欢迎参加第六届IEEE数据科学与高级分析国际会议(DSAA ' 2019)，这是跨越数据科学和高级分析跨学科领域的旗舰年度会议。DSAA汇集了数据科学解决方案的研究人员、行业和政府从业人员以及开发人员和用户。这为数据科学的最新理论发展和广泛应用的最佳实践交流思想创造了一个重要的论坛。DSAA专注于数据科学的科学，以及科学对工业、政府和社会的影响。在科学方面，DSAA涵盖了数据科学的所有组成领域，包括统计学、概率和数学建模、机器学习、数据挖掘和知识发现、复杂性科学、网络科学、商业分析、数据管理、基础设施和存储、检索和搜索、安全、隐私和道德。在应用方面，DSAA强调案例研究，并提出了由应用工作驱动的研究障碍。DSAA展示了受数据科学影响的应用程序，展示了支持部署数据科学解决方案的工具和平台，并向研究人员展示了由应用领域激发的挑战。作为高度专业化的学科会议的替代方案，DSAA反映了数据科学和分析的跨学科性质。

{"title":"Message from the Program Committee Co-Chairs","authors":"Dsaa, J. Yu","doi":"10.1109/wdfia.2008.4","DOIUrl":"https://doi.org/10.1109/wdfia.2008.4","url":null,"abstract":"Welcome to the 6th IEEE International Conference on Data Science and Advanced Analytics (DSAA’2019), the flagship annual meeting that spans the interdisciplinary fields of Data Science and Advanced Analytics. DSAA brings together researchers, industry and government practitioners, as well as developers and users of data science solutions. This creates a premier forum for an exchange of ideas on the latest theoretical developments in Data Science and on the best practice for a wide range of applications. DSAA focuses on the science of data science, as well as the implications of the science to industry, government, and society. On the science side, DSAA spans all the component fields of data science, including statistics, probabilistic and mathematical modeling, machine learning, data mining and knowledge discovery, complexity science, network science, business analytics, data management, infrastructure and storage, retrieval and search, security, privacy and ethics. On the applications side, DSAA highlights case studies and poses research obstacles motivated by applied work. DSAA showcases applications impacted by data science, presents tools and platforms that enable deployed data science solutions, and exposes researchers to challenges motivated by the application domains. As an alternative to the highly specialized disciplinary conferences, DSAA reflects the interdisciplinary nature of data science and analytics.","PeriodicalId":416037,"journal":{"name":"2019 IEEE International Conference on Data Science and Advanced Analytics (DSAA)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130019173","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

MARS: Memory Attention-Aware Recommender System 记忆注意力感知推荐系统

2019 IEEE International Conference on Data Science and Advanced Analytics (DSAA)

Pub Date : 2018-05-18 DOI: 10.1109/dsaa.2019.00015

Lei Zheng, Chun-Ta Lu, Lifang He, Sihong Xie, V. Noroozi, He Huang, Philip S. Yu

In this paper, we study the problem of modeling users' diverse interests. Previous methods usually learn a fixed user representation, which has a limited ability to represent distinct interests of a user. In order to model users' various interests, we propose a Memory Attention-aware Recommender System (MARS). MARS utilizes a memory component and a novel attentional mechanism to learn deep adaptive user representations. Trained in an end-to-end fashion, MARS adaptively summarizes users' interests. In the experiments, MARS outperforms seven state-of-the-art methods on three real-world datasets in terms of recall and mean average precision. We also demonstrate that MARS has a great interpretability to explain its recommendation results, which is important in many recommendation scenarios.

本文研究了用户兴趣多样性建模问题。以前的方法通常学习一个固定的用户表示，它在表示用户的不同兴趣方面能力有限。为了模拟用户的不同兴趣，我们提出了一个记忆注意感知推荐系统(MARS)。MARS利用记忆组件和一种新的注意机制来学习深度自适应用户表示。MARS以端到端方式进行训练，自适应地总结用户的兴趣。在实验中，在召回率和平均精度方面，MARS在三个真实数据集上优于七种最先进的方法。我们还证明了MARS在解释其推荐结果方面具有很强的可解释性，这在许多推荐场景中都很重要。

引用次数: 29

Shape Constrained Tensor Decompositions 形状约束张量分解

2019 IEEE International Conference on Data Science and Advanced Analytics (DSAA)

Pub Date : 2016-08-16 DOI: 10.1109/DSAA.2019.00044

Bethany Lusch, Eric C. Chi, J. Kutz

We propose a new low-rank tensor factorization where one mode is coded as a sparse linear combination of elements from an over-complete library. Our method, Shape Constrained Tensor Decomposition (SCTD) is based upon the CANDECOMP/PARAFAC (CP) decomposition which produces r-rank approximations of data tensors via outer products of vectors in each dimension of the data. The SCTD model can leverage prior knowledge about the shape of factors along a given mode, for example in tensor data where one mode represents time. By constraining the vector in the temporal dimension to known analytic forms which are selected from a large set of candidate functions, more readily interpretable decompositions are achieved and analytic time dependencies discovered. The SCTD method circumvents traditional flattening techniques where an N-way array is reshaped into a matrix in order to perform a singular value decomposition. A clear advantage of the SCTD algorithm is its ability to extract transient and intermittent phenomena which is often difficult for SVD-based methods. We motivate the SCTD method using several intuitively appealing results before applying it on a real-world data set to illustrate the efficiency of the algorithm in extracting interpretable spatio-temporal modes. With the rise of data-driven discovery methods, the decomposition proposed provides a viable technique for analyzing multitudes of data in a more comprehensible fashion.

我们提出了一种新的低秩张量分解方法，其中一个模式被编码为来自过完备库的元素的稀疏线性组合。我们的方法，形状约束张量分解(SCTD)是基于CANDECOMP/PARAFAC (CP)分解，它通过数据的每个维度的向量的外积产生数据张量的r-秩近似。SCTD模型可以利用关于给定模式的因素形状的先验知识，例如在张量数据中，其中一个模式表示时间。通过将时间维度的向量约束为从大量候选函数中选择的已知解析形式，可以实现更易于解释的分解并发现解析时间依赖性。SCTD方法规避了传统的平坦化技术，其中n路数组被重塑成矩阵以执行奇异值分解。SCTD算法的一个明显优势是它能够提取瞬态和间歇现象，这对于基于奇异值分解的方法来说往往是困难的。在将SCTD方法应用于真实世界的数据集之前，我们使用几个直观吸引人的结果来激励SCTD方法，以说明该算法在提取可解释的时空模式方面的效率。随着数据驱动发现方法的兴起，所提出的分解为以更易于理解的方式分析大量数据提供了一种可行的技术。

{"title":"Shape Constrained Tensor Decompositions","authors":"Bethany Lusch, Eric C. Chi, J. Kutz","doi":"10.1109/DSAA.2019.00044","DOIUrl":"https://doi.org/10.1109/DSAA.2019.00044","url":null,"abstract":"We propose a new low-rank tensor factorization where one mode is coded as a sparse linear combination of elements from an over-complete library. Our method, Shape Constrained Tensor Decomposition (SCTD) is based upon the CANDECOMP/PARAFAC (CP) decomposition which produces r-rank approximations of data tensors via outer products of vectors in each dimension of the data. The SCTD model can leverage prior knowledge about the shape of factors along a given mode, for example in tensor data where one mode represents time. By constraining the vector in the temporal dimension to known analytic forms which are selected from a large set of candidate functions, more readily interpretable decompositions are achieved and analytic time dependencies discovered. The SCTD method circumvents traditional flattening techniques where an N-way array is reshaped into a matrix in order to perform a singular value decomposition. A clear advantage of the SCTD algorithm is its ability to extract transient and intermittent phenomena which is often difficult for SVD-based methods. We motivate the SCTD method using several intuitively appealing results before applying it on a real-world data set to illustrate the efficiency of the algorithm in extracting interpretable spatio-temporal modes. With the rise of data-driven discovery methods, the decomposition proposed provides a viable technique for analyzing multitudes of data in a more comprehensible fashion.","PeriodicalId":416037,"journal":{"name":"2019 IEEE International Conference on Data Science and Advanced Analytics (DSAA)","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126468894","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

首页上一页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2019 IEEE International Conference on Data Science and Advanced Analytics (DSAA)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀