首页 > 最新文献

2019 IEEE International Conference on Data Science and Advanced Analytics (DSAA)最新文献

英文 中文
Towards Automated Breast Mass Classification using Deep Learning Framework 基于深度学习框架的乳腺肿块自动分类研究
Pinaki Ranjan Sarkar, Priya Prabhakar, Deepak Mishra, Gorthi R. K. S. S. Manyam
Due to high variability in shape, structure and occurrence; the non-palpable breast masses are often missed by the experienced radiologists. To aid them with more accurate identification, computer-aided detection (CAD) systems are widely used. Most of the developed CAD systems use complex handcrafted features which introduce difficulties for further improvement in performance. Deep or high-level features extracted using deep learning models already have proven its superiority over the low or middle-level handcrafted features. In this paper, we propose an automated deep CAD system performing both the functions: mass detection and classification. Our proposed framework is composed of three cascaded structures: suspicious region identification, mass/no-mass detection and mass classification. To detect the suspicious regions in a breast mammogram, we have used a deep hierarchical mass prediction network. Then we take a decision on whether the predicted lesions contain any abnormal masses using CNN high-level features from the augmented intensity and wavelet features. Afterwards, the mass classification is carried out only for abnormal cases with the same CNN structure. The whole process of breast mass classification including the extraction of wavelet features is automated in this work. We have tested our proposed model on widely used DDSM and INbreast databases in which mass prediction network has achieved the sensitivity of 0.94 and 0.96 followed by a mass/no-mass detection with the area under the curve (AUC) of 0.9976 and 0.9922 respectively on receiver operating characteristic (ROC) curve. Finally, the classification network has obtained an accuracy of 98.05% in DDSM and 98.14% in INbreast database which we believe is the best reported so far.
由于形状、结构和发生的高度可变性;经验丰富的放射科医生常常会忽略不可触及的乳房肿块。为了帮助他们更准确地识别,计算机辅助检测(CAD)系统被广泛使用。大多数已开发的CAD系统使用复杂的手工功能,这给进一步改进性能带来了困难。使用深度学习模型提取的深度或高级特征已经证明了它比低级或中级手工特征的优越性。在本文中,我们提出了一个自动化的深度CAD系统,同时具有质量检测和分类的功能。我们提出的框架由三个级联结构组成:可疑区域识别,质量/非质量检测和质量分类。为了检测乳房x光片中的可疑区域,我们使用了深度分层质量预测网络。然后利用增强强度和小波特征的CNN高阶特征判断预测病灶是否包含异常肿块。之后,只对具有相同CNN结构的异常情况进行海量分类。包括小波特征提取在内的整个乳腺肿块分类过程均实现了自动化。我们在广泛使用的DDSM和INbreast数据库上对所提出的模型进行了测试,质量预测网络的灵敏度分别为0.94和0.96,其次是质量/无质量检测,受试者工作特征(ROC)曲线下面积(AUC)分别为0.9976和0.9922。最后,该分类网络在DDSM和INbreast数据库中分别获得了98.05%和98.14%的准确率,我们认为这是目前报道的最好的分类网络。
{"title":"Towards Automated Breast Mass Classification using Deep Learning Framework","authors":"Pinaki Ranjan Sarkar, Priya Prabhakar, Deepak Mishra, Gorthi R. K. S. S. Manyam","doi":"10.1109/DSAA.2019.00060","DOIUrl":"https://doi.org/10.1109/DSAA.2019.00060","url":null,"abstract":"Due to high variability in shape, structure and occurrence; the non-palpable breast masses are often missed by the experienced radiologists. To aid them with more accurate identification, computer-aided detection (CAD) systems are widely used. Most of the developed CAD systems use complex handcrafted features which introduce difficulties for further improvement in performance. Deep or high-level features extracted using deep learning models already have proven its superiority over the low or middle-level handcrafted features. In this paper, we propose an automated deep CAD system performing both the functions: mass detection and classification. Our proposed framework is composed of three cascaded structures: suspicious region identification, mass/no-mass detection and mass classification. To detect the suspicious regions in a breast mammogram, we have used a deep hierarchical mass prediction network. Then we take a decision on whether the predicted lesions contain any abnormal masses using CNN high-level features from the augmented intensity and wavelet features. Afterwards, the mass classification is carried out only for abnormal cases with the same CNN structure. The whole process of breast mass classification including the extraction of wavelet features is automated in this work. We have tested our proposed model on widely used DDSM and INbreast databases in which mass prediction network has achieved the sensitivity of 0.94 and 0.96 followed by a mass/no-mass detection with the area under the curve (AUC) of 0.9976 and 0.9922 respectively on receiver operating characteristic (ROC) curve. Finally, the classification network has obtained an accuracy of 98.05% in DDSM and 98.14% in INbreast database which we believe is the best reported so far.","PeriodicalId":416037,"journal":{"name":"2019 IEEE International Conference on Data Science and Advanced Analytics (DSAA)","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127766811","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Deep Crowd Counting In Congested Scenes Through Refine Modules 通过细化模块在拥挤场景中进行深度人群计数
Tong Li, Chuan Wang, Xiaochun Cao
Crowd counting, which aims to predict the number of persons in a highly congested scene, has been widely explored and can be used in many applications like video surveillance, pedestrian flow, etc. The severe mutual occlusion among person, the large perspective distortion and the scale variations always hinder an accurate estimation. Although existing approaches have made much progress, there still has room for improvement. The drawbacks of existing methods are 2-fold: (1)the scale information, which is an important factor for crowd counting, is always insufficiently explored and thus cannot bring well-estimated results; (2)using a unified framework for the whole image may result to a rough estimation in subregions, and thus leads to inaccurate estimation. Motivated by this, we propose a new method to address these problems. We first construct a crowd-specific and scale-aware convolutional neural network, which considers crowd scale variations and integrates multi-scale feature representations in the Cross Scale Module (CSM), to produce the initial predicted density map. Then the proposed Local Refine Modules (LRMs) are performed to gradually re-estimate predictions of subregions. We conduct experiments on three crowd counting datasets (the ShanghaiTech dataset, the UCF_CC_50 dataset and the UCSD dataset). Experiments show that our proposed method achieves superior performance compared with the state-of-the-arts. Besides, we conduct experiments on counting vehicles in the TRANCOS dataset and get better results, which proves the generalization ability of the proposed method.
人群计数,旨在预测高度拥挤场景中的人数,已经被广泛探索,可以用于许多应用,如视频监控,行人流量等。人之间严重的相互遮挡、较大的视角畸变和尺度变化往往会阻碍准确的估计。虽然现有办法取得了很大进展,但仍有改进的余地。现有方法存在两方面的缺陷:(1)作为人群计数的重要因素,尺度信息的挖掘不够充分,无法得到较好的估计结果;(2)对整幅图像使用统一的框架可能会导致对子区域的粗略估计,从而导致估计不准确。基于此,我们提出了一种解决这些问题的新方法。首先,我们构建了一个特定人群的、尺度感知的卷积神经网络,该网络考虑了人群的尺度变化,并在跨尺度模块(Cross scale Module, CSM)中集成了多尺度特征表示,生成了初始的预测密度图。然后利用提出的局部细化模块(lrm)逐步重新估计子区域的预测。我们在三个人群统计数据集(ShanghaiTech数据集、UCF_CC_50数据集和UCSD数据集)上进行了实验。实验表明,该方法具有较好的性能。此外,我们还在TRANCOS数据集上进行了车辆计数实验,得到了较好的结果,证明了本文方法的泛化能力。
{"title":"Deep Crowd Counting In Congested Scenes Through Refine Modules","authors":"Tong Li, Chuan Wang, Xiaochun Cao","doi":"10.1109/DSAA.2019.00033","DOIUrl":"https://doi.org/10.1109/DSAA.2019.00033","url":null,"abstract":"Crowd counting, which aims to predict the number of persons in a highly congested scene, has been widely explored and can be used in many applications like video surveillance, pedestrian flow, etc. The severe mutual occlusion among person, the large perspective distortion and the scale variations always hinder an accurate estimation. Although existing approaches have made much progress, there still has room for improvement. The drawbacks of existing methods are 2-fold: (1)the scale information, which is an important factor for crowd counting, is always insufficiently explored and thus cannot bring well-estimated results; (2)using a unified framework for the whole image may result to a rough estimation in subregions, and thus leads to inaccurate estimation. Motivated by this, we propose a new method to address these problems. We first construct a crowd-specific and scale-aware convolutional neural network, which considers crowd scale variations and integrates multi-scale feature representations in the Cross Scale Module (CSM), to produce the initial predicted density map. Then the proposed Local Refine Modules (LRMs) are performed to gradually re-estimate predictions of subregions. We conduct experiments on three crowd counting datasets (the ShanghaiTech dataset, the UCF_CC_50 dataset and the UCSD dataset). Experiments show that our proposed method achieves superior performance compared with the state-of-the-arts. Besides, we conduct experiments on counting vehicles in the TRANCOS dataset and get better results, which proves the generalization ability of the proposed method.","PeriodicalId":416037,"journal":{"name":"2019 IEEE International Conference on Data Science and Advanced Analytics (DSAA)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131853163","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Joint Selection of Central and Extremal Prototypes Based on Kernel Minimum Enclosing Balls 基于核最小包围球的中心和极值原型联合选择
C. Bauckhage, R. Sifa
We present a simple, two step procedure that selects central and extremal prototypes from a given set of data. The key idea is to identify minima of the function that characterizes the interior of a kernel minimum enclosing ball of the data. We discuss how to efficiently compute kernel minimim enclosing balls using the Frank-Wolfe algorithm and show that, for Gaussian kernels, the sought after prototypes can be naturally found via a variant of the mean shift procedure. Practical results demonstrate that prototypes found this way are descriptive, meaningful, and interpretable.
我们提出了一个简单的两步程序,从给定的一组数据中选择中心和极端原型。关键思想是确定函数的最小值,该函数表征数据的核最小封闭球的内部。我们讨论了如何使用Frank-Wolfe算法有效地计算核最小包围球,并表明,对于高斯核,可以通过均值移位过程的一种变体自然地找到所寻求的原型。实际结果表明,用这种方法找到的原型是描述性的、有意义的和可解释的。
{"title":"Joint Selection of Central and Extremal Prototypes Based on Kernel Minimum Enclosing Balls","authors":"C. Bauckhage, R. Sifa","doi":"10.1109/DSAA.2019.00040","DOIUrl":"https://doi.org/10.1109/DSAA.2019.00040","url":null,"abstract":"We present a simple, two step procedure that selects central and extremal prototypes from a given set of data. The key idea is to identify minima of the function that characterizes the interior of a kernel minimum enclosing ball of the data. We discuss how to efficiently compute kernel minimim enclosing balls using the Frank-Wolfe algorithm and show that, for Gaussian kernels, the sought after prototypes can be naturally found via a variant of the mean shift procedure. Practical results demonstrate that prototypes found this way are descriptive, meaningful, and interpretable.","PeriodicalId":416037,"journal":{"name":"2019 IEEE International Conference on Data Science and Advanced Analytics (DSAA)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116385308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Augmenting U.S. Census data on industry and occupation of respondents 增加美国人口普查数据的行业和职业的受访者
P. Meyer, Kendra Asher
The U.S. Census Bureau classifies survey respondents into hundreds of detailed industry and occupation categories. The classification systems change periodically, creating breaks in time series. Standard crosswalks and unified category systems bridge the periods but these often leave sparse or empty cells, or induce sharp changes in time series. We propose a methodology to predict standardized industry, occupation, and related variables for each employed respondent in the public use samples from recent Censuses of Population and CPS data. Unlike earlier approaches, predictions draw from micro data on each individual and large training data sets. Tests of the resulting “augmented” data sets can evaluate their consistency with known trends, smoothness criteria, and benchmarks.
美国人口普查局将调查对象分为数百个详细的行业和职业类别。分类系统周期性地变化,在时间序列中产生中断。标准的人行横道和统一的分类系统架起了这段时间的桥梁,但这些通常会留下稀疏或空的细胞,或者导致时间序列的急剧变化。我们提出了一种方法来预测标准化的行业,职业和相关变量的每个就业受访者在公共使用样本从最近的人口普查和CPS数据。与早期的方法不同,预测利用每个人的微观数据和大型训练数据集。对结果“增强”数据集的测试可以评估它们与已知趋势、平滑标准和基准的一致性。
{"title":"Augmenting U.S. Census data on industry and occupation of respondents","authors":"P. Meyer, Kendra Asher","doi":"10.1109/dsaa.2019.00076","DOIUrl":"https://doi.org/10.1109/dsaa.2019.00076","url":null,"abstract":"The U.S. Census Bureau classifies survey respondents into hundreds of detailed industry and occupation categories. The classification systems change periodically, creating breaks in time series. Standard crosswalks and unified category systems bridge the periods but these often leave sparse or empty cells, or induce sharp changes in time series. We propose a methodology to predict standardized industry, occupation, and related variables for each employed respondent in the public use samples from recent Censuses of Population and CPS data. Unlike earlier approaches, predictions draw from micro data on each individual and large training data sets. Tests of the resulting “augmented” data sets can evaluate their consistency with known trends, smoothness criteria, and benchmarks.","PeriodicalId":416037,"journal":{"name":"2019 IEEE International Conference on Data Science and Advanced Analytics (DSAA)","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126036119","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cross-Media Image-Text Retrieval Combined with Global Similarity and Local Similarity 结合全局相似度和局部相似度的跨媒体图像-文本检索
Zhixin Li, Feng Ling, Canlong Zhang
In this paper, we study the problem of image-text matching in order to make the image and text have better semantic matching. In the previous work, people just simply used the pre-training network to extract image and text features and project directly into a common subspace, or change various loss functions on this basis, or use the attention mechanism to directly match the image region proposals and the text phrases. This is not a good match for the semantics of the image and the text. In this study, we propose a method of cross-media retrieval based on global representation and local representation. We constructed a cross-media two-level network to explore better semantic matching between images and text, which contains subnets that handle both global and local features. Specifically, we not only use the self-attention network to obtain a macro representation of the global image but also use the local fine-grained patch with the attention mechanism. Then, we use a two-level alignment framework to promote each other to learn different representations of cross-media retrieval. The innovation of this study lies in the use of more comprehensive features of image and text to design the two kinds of similarity and add them up in some way. Experimental results show that this method is effective in image-text retrieval. Experimental results on the Flickr30K and MS-COCO datasets show that this model has a better recall rate than many of the current advanced cross-media retrieval models.
为了使图像和文本具有更好的语义匹配,本文研究了图像和文本的匹配问题。在之前的工作中,人们只是简单地使用预训练网络提取图像和文本特征并直接投影到公共子空间中,或者在此基础上改变各种损失函数,或者使用注意机制直接匹配图像区域建议和文本短语。这与图像和文本的语义不太匹配。在本研究中,我们提出了一种基于全局表示和局部表示的跨媒体检索方法。我们构建了一个跨媒体两级网络来探索图像和文本之间更好的语义匹配,该网络包含处理全局和局部特征的子网。具体来说,我们既使用自注意网络获得全局图像的宏观表示,又使用局部细粒度补丁与注意机制相结合。然后,我们使用一个两级对齐框架来相互促进学习跨媒体检索的不同表示。本研究的创新之处在于利用图像和文本更全面的特征来设计两种相似度,并在某种程度上加以叠加。实验结果表明,该方法在图像文本检索中是有效的。在Flickr30K和MS-COCO数据集上的实验结果表明,该模型比目前许多先进的跨媒体检索模型具有更好的查全率。
{"title":"Cross-Media Image-Text Retrieval Combined with Global Similarity and Local Similarity","authors":"Zhixin Li, Feng Ling, Canlong Zhang","doi":"10.1109/DSAA.2019.00029","DOIUrl":"https://doi.org/10.1109/DSAA.2019.00029","url":null,"abstract":"In this paper, we study the problem of image-text matching in order to make the image and text have better semantic matching. In the previous work, people just simply used the pre-training network to extract image and text features and project directly into a common subspace, or change various loss functions on this basis, or use the attention mechanism to directly match the image region proposals and the text phrases. This is not a good match for the semantics of the image and the text. In this study, we propose a method of cross-media retrieval based on global representation and local representation. We constructed a cross-media two-level network to explore better semantic matching between images and text, which contains subnets that handle both global and local features. Specifically, we not only use the self-attention network to obtain a macro representation of the global image but also use the local fine-grained patch with the attention mechanism. Then, we use a two-level alignment framework to promote each other to learn different representations of cross-media retrieval. The innovation of this study lies in the use of more comprehensive features of image and text to design the two kinds of similarity and add them up in some way. Experimental results show that this method is effective in image-text retrieval. Experimental results on the Flickr30K and MS-COCO datasets show that this model has a better recall rate than many of the current advanced cross-media retrieval models.","PeriodicalId":416037,"journal":{"name":"2019 IEEE International Conference on Data Science and Advanced Analytics (DSAA)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126046334","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
DSAA Keynotes
{"title":"DSAA Keynotes","authors":"","doi":"10.1109/dsaa.2019.00011","DOIUrl":"https://doi.org/10.1109/dsaa.2019.00011","url":null,"abstract":"","PeriodicalId":416037,"journal":{"name":"2019 IEEE International Conference on Data Science and Advanced Analytics (DSAA)","volume":"15 11","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120905338","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Don't Cry Wolf 别喊狼来了
Pub Date : 2019-10-01 DOI: 10.1109/DSAA46601.2019.9062728
Philip E. Brown, T. Dasu, Y. Kanza, E. Koutsofios, R. Malik, D. Srivastava
Real world anomaly management systems oversee thousands of dynamic data streams and generate an overwhelming number of alerts. As a consequence, important alerts often go unnoticed until there is a crisis. The absence of ground truth, and the fact that the streams are constantly changing (new content, new applications, software and hardware changes) makes assessing the value of alerts difficult. In order to identify groups of important and actionable alerts, we propose: (1) superalerts that reflect characteristics of persistence, pervasiveness and priority, (2) three types of super-alerting based on three types of aggregations and, (3) corresponding metrics for evaluating them. We demonstrate using real-world entertainment data streams.
现实世界中的异常管理系统会监督数千个动态数据流,并生成大量警报。因此,重要的警报往往被忽视,直到出现危机。缺乏真实的信息,以及信息流不断变化的事实(新内容、新应用、软件和硬件的变化)使得评估警报的价值变得困难。为了识别重要和可操作的警报组,我们提出:(1)反映持久性、普遍性和优先级特征的超级警报;(2)基于三种聚合类型的三种超级警报;(3)评估它们的相应指标。我们使用现实世界的娱乐数据流进行演示。
{"title":"Don't Cry Wolf","authors":"Philip E. Brown, T. Dasu, Y. Kanza, E. Koutsofios, R. Malik, D. Srivastava","doi":"10.1109/DSAA46601.2019.9062728","DOIUrl":"https://doi.org/10.1109/DSAA46601.2019.9062728","url":null,"abstract":"Real world anomaly management systems oversee thousands of dynamic data streams and generate an overwhelming number of alerts. As a consequence, important alerts often go unnoticed until there is a crisis. The absence of ground truth, and the fact that the streams are constantly changing (new content, new applications, software and hardware changes) makes assessing the value of alerts difficult. In order to identify groups of important and actionable alerts, we propose: (1) superalerts that reflect characteristics of persistence, pervasiveness and priority, (2) three types of super-alerting based on three types of aggregations and, (3) corresponding metrics for evaluating them. We demonstrate using real-world entertainment data streams.","PeriodicalId":416037,"journal":{"name":"2019 IEEE International Conference on Data Science and Advanced Analytics (DSAA)","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134382440","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Range Analysis and Applications to Root Causing 范围分析及根源分析的应用
Z. Khasidashvili, A. Norman
We propose a supervised learning algorithm whose aim is to derive features that explain the response variable better than the original features. Moreover, when there is a meaning for positive vs negative samples, our aim is to derive features that explain the positive samples, or subsets of positive samples that have the same root-cause. Each derived feature represents a single or multi-dimensional subspace of the feature space, where each dimension is specified as a feature-range pair for numeric features, and as a feature-level pair for categorical features. Unlike most Rule Learning and Subgroup Discovery algorithms, the response variable can be numeric, and our algorithm does not require a discretization of the response. The algorithm has been applied successfully to numerous real-life root-causing tasks in chip design, manufacturing, and validation, at Intel.
我们提出了一种监督学习算法,其目的是推导出比原始特征更好地解释响应变量的特征。此外,当正样本与负样本有意义时,我们的目标是推导出解释正样本或具有相同根本原因的正样本子集的特征。每个衍生特征表示特征空间的单个或多维子空间,其中每个维度被指定为数字特征的特征范围对,以及分类特征的特征级别对。与大多数规则学习和子组发现算法不同,响应变量可以是数值的,并且我们的算法不需要对响应进行离散化。该算法已成功应用于英特尔芯片设计、制造和验证中的许多现实生活中的根源任务。
{"title":"Range Analysis and Applications to Root Causing","authors":"Z. Khasidashvili, A. Norman","doi":"10.1109/DSAA.2019.00045","DOIUrl":"https://doi.org/10.1109/DSAA.2019.00045","url":null,"abstract":"We propose a supervised learning algorithm whose aim is to derive features that explain the response variable better than the original features. Moreover, when there is a meaning for positive vs negative samples, our aim is to derive features that explain the positive samples, or subsets of positive samples that have the same root-cause. Each derived feature represents a single or multi-dimensional subspace of the feature space, where each dimension is specified as a feature-range pair for numeric features, and as a feature-level pair for categorical features. Unlike most Rule Learning and Subgroup Discovery algorithms, the response variable can be numeric, and our algorithm does not require a discretization of the response. The algorithm has been applied successfully to numerous real-life root-causing tasks in chip design, manufacturing, and validation, at Intel.","PeriodicalId":416037,"journal":{"name":"2019 IEEE International Conference on Data Science and Advanced Analytics (DSAA)","volume":"373 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133278102","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Sensor-Based Human Activity Mining Using Dirichlet Process Mixtures of Directional Statistical Models 基于传感器的定向统计模型Dirichlet过程混合人类活动挖掘
L. Fang, Juan Ye, S. Dobson
We have witnessed an increasing number of activity-aware applications being deployed in real-world environments, including smart home and mobile healthcare. The key enabler to these applications is sensor-based human activity recognition; that is, recognising and analysing human daily activities from wearable and ambient sensors. With the power of machine learning we can recognise complex correlations between various types of sensor data and the activities being observed. However the challenges still remain: (1) they often rely on a large amount of labelled training data to build the model, and (2) they cannot dynamically adapt the model with emerging or changing activity patterns over time. To directly address these challenges, we propose a Bayesian nonparametric model, i.e. Dirichlet process mixture of conditionally independent von Mises Fisher models, to enable both unsupervised and semi-supervised dynamic learning of human activities. The Bayesian nonparametric model can dynamically adapt itself to the evolving activity patterns without human intervention and the learning results can be used to alleviate the annotation effort. We evaluate our approach against real-world, third-party smart home datasets, and demonstrate significant improvements over the state-of-the-art techniques in both unsupervised and supervised settings.
我们已经看到越来越多的活动感知应用程序部署在现实环境中,包括智能家居和移动医疗保健。这些应用的关键促成因素是基于传感器的人类活动识别;也就是说,通过可穿戴和环境传感器识别和分析人类的日常活动。借助机器学习的力量,我们可以识别各种类型的传感器数据和被观察到的活动之间的复杂相关性。然而,挑战仍然存在:(1)它们通常依赖于大量标记的训练数据来构建模型;(2)它们不能随着时间的推移动态地适应新出现或变化的活动模式。为了直接解决这些挑战,我们提出了一个贝叶斯非参数模型,即条件独立的von Mises Fisher模型的Dirichlet过程混合物,以实现人类活动的无监督和半监督动态学习。贝叶斯非参数模型可以在没有人为干预的情况下动态适应不断变化的活动模式,并且学习结果可以用来减轻注释工作。我们针对现实世界的第三方智能家居数据集评估了我们的方法,并在无监督和有监督设置中展示了比最先进技术的重大改进。
{"title":"Sensor-Based Human Activity Mining Using Dirichlet Process Mixtures of Directional Statistical Models","authors":"L. Fang, Juan Ye, S. Dobson","doi":"10.1109/DSAA.2019.00030","DOIUrl":"https://doi.org/10.1109/DSAA.2019.00030","url":null,"abstract":"We have witnessed an increasing number of activity-aware applications being deployed in real-world environments, including smart home and mobile healthcare. The key enabler to these applications is sensor-based human activity recognition; that is, recognising and analysing human daily activities from wearable and ambient sensors. With the power of machine learning we can recognise complex correlations between various types of sensor data and the activities being observed. However the challenges still remain: (1) they often rely on a large amount of labelled training data to build the model, and (2) they cannot dynamically adapt the model with emerging or changing activity patterns over time. To directly address these challenges, we propose a Bayesian nonparametric model, i.e. Dirichlet process mixture of conditionally independent von Mises Fisher models, to enable both unsupervised and semi-supervised dynamic learning of human activities. The Bayesian nonparametric model can dynamically adapt itself to the evolving activity patterns without human intervention and the learning results can be used to alleviate the annotation effort. We evaluate our approach against real-world, third-party smart home datasets, and demonstrate significant improvements over the state-of-the-art techniques in both unsupervised and supervised settings.","PeriodicalId":416037,"journal":{"name":"2019 IEEE International Conference on Data Science and Advanced Analytics (DSAA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130152680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Colorwall: An Embedded Temporal Display of Bibliographic Data Colorwall:书目数据的嵌入式时间显示
Jing Ming, Li Zhang
A bibliographical data set is often visualized as a network to depict relationships among authors. However, static networks only display minimal information when a dataset accommodates temporal features. This paper proposes an embedded network visualization to present concealed temporal patterns in a data set and leverage multiple intelligent filters to reduce occlusion. We compare different graphing styles, such as feature representation and time direction, then determine the best approach for displaying temporal features. We demonstrate the usability of our approach with case studies and an evaluation of the IEEE InfoVis and VAST conference dataset.
一个书目数据集通常被可视化为一个网络来描述作者之间的关系。然而,静态网络只在数据集适应时间特征时显示最小的信息。本文提出了一种嵌入式网络可视化方法来呈现数据集中隐藏的时间模式,并利用多个智能过滤器来减少遮挡。我们比较了不同的绘图风格,如特征表示和时间方向,然后确定了显示时间特征的最佳方法。我们通过案例研究和对IEEE InfoVis和VAST会议数据集的评估来证明我们方法的可用性。
{"title":"Colorwall: An Embedded Temporal Display of Bibliographic Data","authors":"Jing Ming, Li Zhang","doi":"10.1109/DSAA.2019.00063","DOIUrl":"https://doi.org/10.1109/DSAA.2019.00063","url":null,"abstract":"A bibliographical data set is often visualized as a network to depict relationships among authors. However, static networks only display minimal information when a dataset accommodates temporal features. This paper proposes an embedded network visualization to present concealed temporal patterns in a data set and leverage multiple intelligent filters to reduce occlusion. We compare different graphing styles, such as feature representation and time direction, then determine the best approach for displaying temporal features. We demonstrate the usability of our approach with case studies and an evaluation of the IEEE InfoVis and VAST conference dataset.","PeriodicalId":416037,"journal":{"name":"2019 IEEE International Conference on Data Science and Advanced Analytics (DSAA)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114953185","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
期刊
2019 IEEE International Conference on Data Science and Advanced Analytics (DSAA)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1