2019 IEEE International Conference on Big Knowledge (ICBK)最新文献

英文中文

FASE: Feature-Based Similarity Search on ECG Data 基于特征的心电数据相似度搜索

2019 IEEE International Conference on Big Knowledge (ICBK)

Pub Date : 2019-11-01 DOI: 10.1109/ICBK.2019.00044

Meng Wu, Lei Li, Hongyan Li

Electrocardiogram (ECG) data is commonly used in clinic to reveal instant status of cardiac electrophysiology, and is related to numerous heart diseases. Efficient similarity search on ECG data can assist diagnosis. However, similarity search on ECG data is different from similarity search on images in that ECG data is a kind of physiological wave data, and that there are no established robust feature extraction methods for these physiological wave data. Thus, we adopt a supervised framework to preserve locality based on label information, while extracting effective features automatically. Experiments on real-life data show the effectiveness and efficiency of the proposed approach FASE.

心电图(Electrocardiogram, ECG)数据是临床常用的反映心脏电生理即时状态的数据，与许多心脏疾病有关。对心电数据进行高效的相似度搜索有助于诊断。然而，心电数据的相似度搜索不同于图像的相似度搜索，因为心电数据是一种生理波数据，对于这些生理波数据，目前还没有成熟的鲁棒性特征提取方法。因此，我们采用基于标签信息的监督框架来保持局部性，同时自动提取有效特征。实际数据实验证明了该方法的有效性和高效性。

引用次数: 3

Inductive Multi-view Semi-Supervised Anomaly Detection via Probabilistic Modeling 基于概率建模的感应多视图半监督异常检测

2019 IEEE International Conference on Big Knowledge (ICBK)

Pub Date : 2019-11-01 DOI: 10.1109/ICBK.2019.00042

Zhen Wang, Maohong Fan, S. Muknahallipatna, Chao Lan

This paper considers anomaly detection with multi-view data. Unlike traditional detection on single-view data which identifies anomalies based on inconsistency between instances, multi-view anomaly detection identifies anomalies based on view inconsistency within each instance. Current multi-view detection approaches are mostly unsupervised and transductive. This may have limited performance in many applications, which have labeled normal data and prefer efficient detection on new data. In this paper, we propose an inductive semi-supervised multi-view anomaly detection approach. We design a probabilistic generative model for normal data, which assumes different views of a normal instance are generated from a shared latent factor, conditioned on which the views become independent. We estimate the model by maximizing its likelihood on normal data using the EM algorithm. Then, we apply the model to detect anomalies, which are instances generated with small probabilities. We experiment our approach on nine public data sets under different multi-view anomaly settings, and show it outperforms several state-of-the-art multi-view detection methods.

本文考虑了多视图数据的异常检测。与传统的单视图数据检测基于实例之间的不一致性来识别异常不同，多视图异常检测基于每个实例内的视图不一致性来识别异常。目前的多视图检测方法大多是无监督和换能化的。在许多应用程序中，这可能会限制性能，因为这些应用程序已经标记了正常数据，并且更喜欢对新数据进行有效检测。本文提出了一种归纳式半监督多视点异常检测方法。我们为正常数据设计了一个概率生成模型，该模型假设一个正常实例的不同视图是由一个共享的潜在因素生成的，在这个潜在因素的条件下，视图变得独立。我们通过使用EM算法最大化其在正常数据上的似然来估计模型。然后，我们将该模型应用于检测异常，这些异常是小概率生成的实例。我们在不同的多视图异常设置下对9个公共数据集进行了实验，并表明该方法优于几种最先进的多视图检测方法。

{"title":"Inductive Multi-view Semi-Supervised Anomaly Detection via Probabilistic Modeling","authors":"Zhen Wang, Maohong Fan, S. Muknahallipatna, Chao Lan","doi":"10.1109/ICBK.2019.00042","DOIUrl":"https://doi.org/10.1109/ICBK.2019.00042","url":null,"abstract":"This paper considers anomaly detection with multi-view data. Unlike traditional detection on single-view data which identifies anomalies based on inconsistency between instances, multi-view anomaly detection identifies anomalies based on view inconsistency within each instance. Current multi-view detection approaches are mostly unsupervised and transductive. This may have limited performance in many applications, which have labeled normal data and prefer efficient detection on new data. In this paper, we propose an inductive semi-supervised multi-view anomaly detection approach. We design a probabilistic generative model for normal data, which assumes different views of a normal instance are generated from a shared latent factor, conditioned on which the views become independent. We estimate the model by maximizing its likelihood on normal data using the EM algorithm. Then, we apply the model to detect anomalies, which are instances generated with small probabilities. We experiment our approach on nine public data sets under different multi-view anomaly settings, and show it outperforms several state-of-the-art multi-view detection methods.","PeriodicalId":383917,"journal":{"name":"2019 IEEE International Conference on Big Knowledge (ICBK)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121070423","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Vector-Degree: A General Similarity Measure for Co-location Patterns 向量度:共同定位模式的一般相似度量

2019 IEEE International Conference on Big Knowledge (ICBK)

Pub Date : 2019-11-01 DOI: 10.1109/ICBK.2019.00045

Pingping Wu, Lizhen Wang, Muquan Zou

Co-location pattern mining is one of the hot issues in spatial pattern mining. Similarity measures between co-location patterns can be used to solve problems such as pattern compression, pattern summarization, pattern selection and pattern ordering. Although, many researchers have focused on this issue recently and provided a more concise set of co-location patterns based on these measures. Unfortunately, these measures suffer from various weaknesses, e.g., some measures can only calculate the similarity between super-pattern and sub-pattern while some others require additional domain knowledge. In this paper, we propose a general similarity measure for any two co-location patterns. Firstly, we study the characteristics of the co-location pattern and present a novel representation model based on maximal cliques. Then, two materializations of the maximal clique and the pattern relationship, 0-1 vector and key-value vector, are proposed and discussed in the paper. Moreover, based on the materialization methods, the similarity measure, Vector-Degree, is defined by applying the cosine similarity. Finally, similarity is used to group the patterns by a hierarchical clustering algorithm. The experimental results on both synthetic and real world data sets show the efficiency and effectiveness of our proposed method.

同址模式挖掘是空间模式挖掘中的热点问题之一。共定位模式之间的相似性度量可用于解决模式压缩、模式总结、模式选择和模式排序等问题。尽管最近许多研究人员都在关注这个问题，并在这些度量的基础上提供了一套更简洁的共址模式。不幸的是，这些度量存在各种弱点，例如，一些度量只能计算超级模式和子模式之间的相似性，而另一些度量则需要额外的领域知识。在本文中，我们提出了一个通用的相似性度量任意两个共位模式。首先，研究了同位模式的特点，提出了一种基于最大团的同位模式表示模型。然后，提出并讨论了极大团及其模式关系的两种物化形式:0-1向量和键值向量。在物化方法的基础上，利用余弦相似度定义了相似度度量向量度。最后，利用相似度对模式进行分层聚类。在合成数据集和实际数据集上的实验结果表明了我们提出的方法的效率和有效性。

{"title":"Vector-Degree: A General Similarity Measure for Co-location Patterns","authors":"Pingping Wu, Lizhen Wang, Muquan Zou","doi":"10.1109/ICBK.2019.00045","DOIUrl":"https://doi.org/10.1109/ICBK.2019.00045","url":null,"abstract":"Co-location pattern mining is one of the hot issues in spatial pattern mining. Similarity measures between co-location patterns can be used to solve problems such as pattern compression, pattern summarization, pattern selection and pattern ordering. Although, many researchers have focused on this issue recently and provided a more concise set of co-location patterns based on these measures. Unfortunately, these measures suffer from various weaknesses, e.g., some measures can only calculate the similarity between super-pattern and sub-pattern while some others require additional domain knowledge. In this paper, we propose a general similarity measure for any two co-location patterns. Firstly, we study the characteristics of the co-location pattern and present a novel representation model based on maximal cliques. Then, two materializations of the maximal clique and the pattern relationship, 0-1 vector and key-value vector, are proposed and discussed in the paper. Moreover, based on the materialization methods, the similarity measure, Vector-Degree, is defined by applying the cosine similarity. Finally, similarity is used to group the patterns by a hierarchical clustering algorithm. The experimental results on both synthetic and real world data sets show the efficiency and effectiveness of our proposed method.","PeriodicalId":383917,"journal":{"name":"2019 IEEE International Conference on Big Knowledge (ICBK)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126152898","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Machine Learning Models for Paraphrase Identification and its Applications on Plagiarism Detection 释义识别的机器学习模型及其在抄袭检测中的应用

2019 IEEE International Conference on Big Knowledge (ICBK)

Pub Date : 2019-11-01 DOI: 10.1109/ICBK.2019.00021

E. Hunt, Binay Dahal, J. Zhan, L. Gewali, Paul Y. Oh, Ritvik Janamsetty, Chanana Kinares, Chanel Koh, Alexis Sanchez, Felix Zhan, Murat Özdemir, Shabnam Waseem, Osman Yolcu

Paraphrase Identification or Natural Language Sentence Matching (NLSM) is one of the important and challenging tasks in Natural Language Processing where the task is to identify if a sentence is a paraphrase of another sentence in a given pair of sentences. Paraphrase of a sentence conveys the same meaning but its structure and the sequence of words varies. It is a challenging task as it is difficult to infer the proper context about a sentence given its short length. Also, coming up with similarity metrics for the inferred context of a pair of sentences is not straightforward as well. Whereas, its applications are numerous. This work explores various machine learning algorithms to model the task and also applies different input encoding scheme. Specifically, we created the models using Logistic Regression, Support Vector Machines, and different architectures of Neural Networks. Among the compared models, as expected, Recurrent Neural Network (RNN) is best suited for our paraphrase identification task. Also, we propose that Plagiarism detection is one of the areas where Paraphrase Identification can be effectively implemented.

意译识别或自然语言句子匹配(NLSM)是自然语言处理中重要且具有挑战性的任务之一，其任务是在给定的一对句子中识别一个句子是否为另一个句子的意译。一个句子的释义传达的意思是相同的，但它的结构和单词的顺序不同。这是一项具有挑战性的任务，因为由于句子的长度很短，很难推断出正确的上下文。此外，为一对句子的推断上下文提出相似性度量也不是直截了当地的。然而，它的应用是众多的。这项工作探索了各种机器学习算法来模拟任务，并应用了不同的输入编码方案。具体来说，我们使用逻辑回归、支持向量机和不同的神经网络架构来创建模型。在比较的模型中，如预期的那样，递归神经网络(RNN)最适合我们的意译识别任务。此外，我们建议抄袭检测是释义识别可以有效实施的领域之一。

{"title":"Machine Learning Models for Paraphrase Identification and its Applications on Plagiarism Detection","authors":"E. Hunt, Binay Dahal, J. Zhan, L. Gewali, Paul Y. Oh, Ritvik Janamsetty, Chanana Kinares, Chanel Koh, Alexis Sanchez, Felix Zhan, Murat Özdemir, Shabnam Waseem, Osman Yolcu","doi":"10.1109/ICBK.2019.00021","DOIUrl":"https://doi.org/10.1109/ICBK.2019.00021","url":null,"abstract":"Paraphrase Identification or Natural Language Sentence Matching (NLSM) is one of the important and challenging tasks in Natural Language Processing where the task is to identify if a sentence is a paraphrase of another sentence in a given pair of sentences. Paraphrase of a sentence conveys the same meaning but its structure and the sequence of words varies. It is a challenging task as it is difficult to infer the proper context about a sentence given its short length. Also, coming up with similarity metrics for the inferred context of a pair of sentences is not straightforward as well. Whereas, its applications are numerous. This work explores various machine learning algorithms to model the task and also applies different input encoding scheme. Specifically, we created the models using Logistic Regression, Support Vector Machines, and different architectures of Neural Networks. Among the compared models, as expected, Recurrent Neural Network (RNN) is best suited for our paraphrase identification task. Also, we propose that Plagiarism detection is one of the areas where Paraphrase Identification can be effectively implemented.","PeriodicalId":383917,"journal":{"name":"2019 IEEE International Conference on Big Knowledge (ICBK)","volume":"2013 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127421292","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 28

Complicacy-Guided Parameter Space Sampling for Knowledge Discovery with Limited Simulation Budgets 有限仿真预算下知识发现的复杂度引导参数空间采样

2019 IEEE International Conference on Big Knowledge (ICBK)

Pub Date : 2019-11-01 DOI: 10.1109/ICBK.2019.00015

Xilun Chen, L. Mathesen, Giulia Pedrielli, K. Candan

Knowledge discovery and decision making through data-and model-driven computer simulation ensembles are increasingly critical in many application domains. However, these simulation ensembles are expensive to obtain. Consequently, given a relatively small simulation budget, one needs to identify a sparse ensemble that includes the most informative simulations to help the effective exploration of the space of input parameters. In this paper, we propose a complicacy-guided parameter space sampling (CPSS) for knowledge discovery with limited simulation budgets, which relies on a novel complicacy-driven guidance mechanism to rank candidate models and a novel rank-stability based parameter space partitioning strategy to identify simulation instances to execute. The advantage of the proposed approach is that, unlike purely fit-based approaches, it avoids extensive simulations in difficult-to-fit regions of the parameter space, if the region can be explained with a much simpler model, requiring fewer simulation samples, even if with a slightly lower fit.

通过数据和模型驱动的计算机仿真集成进行知识发现和决策在许多应用领域中越来越重要。然而，这些模拟集成是昂贵的获得。因此，给定一个相对较小的模拟预算，需要确定一个包含最多信息模拟的稀疏集合，以帮助有效地探索输入参数的空间。在本文中，我们提出了一种用于有限仿真预算的知识发现的复杂度引导参数空间采样(CPSS)方法，该方法依赖于一种新的复杂度驱动的指导机制对候选模型进行排序，并依赖于一种新的基于排序稳定性的参数空间划分策略来识别要执行的仿真实例。所提出的方法的优点是，与纯粹基于拟合的方法不同，它避免了在参数空间的难以拟合区域进行大量模拟，如果该区域可以用更简单的模型来解释，则需要更少的模拟样本，即使拟合程度略低。

引用次数: 0

Highly Parallel Seedless Random Number Generation from Arbitrary Thread Schedule Reconstruction 基于任意线程调度重构的高度并行无籽随机数生成

2019 IEEE International Conference on Big Knowledge (ICBK)

Pub Date : 2019-11-01 DOI: 10.1109/ICBK.2019.00009

Eryn Aguilar, Benjamin Lowe, J. Zhan, L. Gewali, Paul Y. Oh, Jevis Dancel, Deysaree Mamaud, Dorothy Pirosch, Farin Tavacoli, Felix Zhan, Robbie Pearce, Margaret Novack, Hokunani Keehu

Security is a universal concern across a multitude of sectors involved in the transfer and storage of computerized data. In the realm of cryptography, random number generators (RNGs) are integral to the creation of encryption keys that protect private data, and the production of uniform probability outcomes is a revenue source for certain enterprises (most notably the casino industry). Arbitrary thread schedule reconstruction of compare-and-swap operations is used to generate input traces for the Blum-Elias algorithm as a method for constructing random sequences, provided the compare-and-swap operations avoid cache locality. Threads accessing shared memory at the memory controller is a true random source which can be polled indirectly through our algorithm with unlimited parallelism. A theoretical and experimental analysis of the observation and reconstruction algorithm are considered. The quality of the random number generator is experimentally analyzed using two standard test suites, DieHarder and ENT, on three data sets.

安全是涉及计算机数据传输和存储的众多部门普遍关注的问题。在密码学领域，随机数生成器(rng)对于创建保护私人数据的加密密钥是不可或缺的，并且产生均匀概率结果是某些企业(最明显的是赌场行业)的收入来源。如果比较和交换操作避免缓存局域性，则使用比较和交换操作的任意线程调度重构来为Blum-Elias算法生成输入跟踪，作为构造随机序列的方法。在内存控制器上访问共享内存的线程是一个真正的随机源，可以通过无限并行的算法间接轮询。对观测和重建算法进行了理论和实验分析。在三个数据集上，使用DieHarder和ENT两个标准测试套件对随机数生成器的质量进行了实验分析。

{"title":"Highly Parallel Seedless Random Number Generation from Arbitrary Thread Schedule Reconstruction","authors":"Eryn Aguilar, Benjamin Lowe, J. Zhan, L. Gewali, Paul Y. Oh, Jevis Dancel, Deysaree Mamaud, Dorothy Pirosch, Farin Tavacoli, Felix Zhan, Robbie Pearce, Margaret Novack, Hokunani Keehu","doi":"10.1109/ICBK.2019.00009","DOIUrl":"https://doi.org/10.1109/ICBK.2019.00009","url":null,"abstract":"Security is a universal concern across a multitude of sectors involved in the transfer and storage of computerized data. In the realm of cryptography, random number generators (RNGs) are integral to the creation of encryption keys that protect private data, and the production of uniform probability outcomes is a revenue source for certain enterprises (most notably the casino industry). Arbitrary thread schedule reconstruction of compare-and-swap operations is used to generate input traces for the Blum-Elias algorithm as a method for constructing random sequences, provided the compare-and-swap operations avoid cache locality. Threads accessing shared memory at the memory controller is a true random source which can be polled indirectly through our algorithm with unlimited parallelism. A theoretical and experimental analysis of the observation and reconstruction algorithm are considered. The quality of the random number generator is experimentally analyzed using two standard test suites, DieHarder and ENT, on three data sets.","PeriodicalId":383917,"journal":{"name":"2019 IEEE International Conference on Big Knowledge (ICBK)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130265248","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

Adversarial Graph Attention Network for Multi-modal Cross-Modal Retrieval 多模态跨模态检索的对抗性图注意网络

2019 IEEE International Conference on Big Knowledge (ICBK)

Pub Date : 2019-11-01 DOI: 10.1109/ICBK.2019.00043

Hongchang Wu, Ziyu Guan, Tao Zhi, Wei Zhao, Cai Xu, Hong Han, Yaming Yang

Existing cross-modal retrieval methods are mainly constrained to the bimodal case. When applied to the multi-modal case, we need to train O(K^2) (K: number of modalities) separate models, which is inefficient and unable to exploit common information among multiple modalities. Though some studies focused on learning a common space of multiple modalities for retrieval, they assumed data to be i.i.d. and failed to learn the underlying semantic structure which could be important for retrieval. To tackle this issue, we propose an extensive Adversarial Graph Attention Network for Multi-modal Cross-modal Retrieval (AGAT). AGAT synthesizes a self-attention network (SAT), a graph attention network (GAT) and a multi-modal generative adversarial network (MGAN). The SAT generates high-level embeddings for data items from different modalities, with self-attention capturing feature-level correlations in each modality. The GAT then uses attention to aggregate embeddings of matched items from different modalities to build a common embedding space. The MGAN aims to "cluster" matched embeddings of different modalities in the common space by forcing them to be similar to the aggregation. Finally, we train the common space so that it captures the semantic structure by constraining within-class/between-class distances. Experiments on three datasets show the effectiveness of AGAT.

现有的跨模态检索方法主要局限于双峰情况。当应用于多模态情况时，我们需要训练O(K^2) (K:模态数)个单独的模型，这是低效的，并且无法利用多个模态之间的共同信息。虽然一些研究侧重于学习多模态的公共空间进行检索，但他们假设数据是i.id的，未能学习到对检索很重要的底层语义结构。为了解决这个问题，我们提出了一个广泛的多模态跨模态检索(AGAT)的对抗图注意网络。AGAT综合了自注意网络(SAT)、图注意网络(GAT)和多模态生成对抗网络(MGAN)。SAT为来自不同模态的数据项生成高级嵌入，并在每个模态中捕获自关注的特征级相关性。然后，GAT利用注意力聚合来自不同模态的匹配项的嵌入，以构建一个公共嵌入空间。MGAN的目标是通过迫使不同模式的匹配嵌入在公共空间中与聚合相似来“聚类”它们。最后，我们训练公共空间，使其通过约束类内/类间距离来捕获语义结构。在三个数据集上的实验证明了AGAT算法的有效性。

{"title":"Adversarial Graph Attention Network for Multi-modal Cross-Modal Retrieval","authors":"Hongchang Wu, Ziyu Guan, Tao Zhi, Wei Zhao, Cai Xu, Hong Han, Yaming Yang","doi":"10.1109/ICBK.2019.00043","DOIUrl":"https://doi.org/10.1109/ICBK.2019.00043","url":null,"abstract":"Existing cross-modal retrieval methods are mainly constrained to the bimodal case. When applied to the multi-modal case, we need to train O(K^2) (K: number of modalities) separate models, which is inefficient and unable to exploit common information among multiple modalities. Though some studies focused on learning a common space of multiple modalities for retrieval, they assumed data to be i.i.d. and failed to learn the underlying semantic structure which could be important for retrieval. To tackle this issue, we propose an extensive Adversarial Graph Attention Network for Multi-modal Cross-modal Retrieval (AGAT). AGAT synthesizes a self-attention network (SAT), a graph attention network (GAT) and a multi-modal generative adversarial network (MGAN). The SAT generates high-level embeddings for data items from different modalities, with self-attention capturing feature-level correlations in each modality. The GAT then uses attention to aggregate embeddings of matched items from different modalities to build a common embedding space. The MGAN aims to \"cluster\" matched embeddings of different modalities in the common space by forcing them to be similar to the aggregation. Finally, we train the common space so that it captures the semantic structure by constraining within-class/between-class distances. Experiments on three datasets show the effectiveness of AGAT.","PeriodicalId":383917,"journal":{"name":"2019 IEEE International Conference on Big Knowledge (ICBK)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130393826","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Unsupervised Keyword Extraction Method Based on Chinese Patent Clustering 基于中文专利聚类的无监督关键字提取方法

2019 IEEE International Conference on Big Knowledge (ICBK)

Pub Date : 2019-11-01 DOI: 10.1109/ICBK.2019.00048

Yuxin Xie, Xuegang Hu, Yuhong Zhang, Shi Li

Recently, patent data analysis has attracted a lot of attention, and patent keyword extraction is a hot problem. Most existing methods for patent keyword extraction are based on the frequency of words without semantic information. In this paper, we propose an Unsupervised Keyword Extraction Method (UKEM) based on Chinese patent clustering. More specifically, we use a Skip-gram model to train word embeddings based on a Chinese patent corpus. Then each patent is represented as a vector called patent vector. These patent vectors are clustered to obtain several cluster centroids. Next, the distance between each word vector in patent abstract and cluster centroid is computed to indicate the semantic importance of this word. The experimental results on several Chinese patent datasets show that the performance of our proposed method is better than several competitive methods.

近年来，专利数据分析备受关注，其中专利关键词提取是一个热点问题。现有的专利关键词提取方法大多是基于没有语义信息的词的频率。本文提出了一种基于中文专利聚类的无监督关键字提取方法。更具体地说，我们使用Skip-gram模型来训练基于中文专利语料库的词嵌入。然后将每个专利表示为称为专利矢量的矢量。对这些专利向量进行聚类，得到多个聚类质心。其次，计算专利摘要中每个词向量与聚类质心之间的距离，以表示该词的语义重要性。在多个中国专利数据集上的实验结果表明，该方法的性能优于几种竞争方法。

引用次数: 2

Research on Incentive Algorithm of Participatory Sensing System Based on Location 基于位置的参与式感知系统激励算法研究

2019 IEEE International Conference on Big Knowledge (ICBK)

Pub Date : 2019-11-01 DOI: 10.1109/ICBK.2019.00035

Ziyi Qi, Mingxin Liu, Yanju Liang, Jing Chen

At present, user participation as the main body of the perception system will bring the problems that include consuming user's time, energy and participation costs, and so on. Therefore, giving reasonable feedback and encouragement to user participation itself can effectively improve user's initiative and data quality. Combining data quantity, data distribution and budget constraint together, an improved incentive mechanism of reverse auction is proposed based on the structure of participatory sensing system in this paper. Firstly, to maximize the coverage rate and the number of samples as the optimization goal, a model combining the dynamic reverse auction incentive strategy is designed based on the limited budget of the task provider. Secondly, on the basis of optimizing the results of sample screening, the improved algorithm KDA incentive mechanism based on position information is proposed. The algorithm combines the greedy algorithm to gradually decompose the idea of subproblem optimization, in order to ensure that the optimization results are closer to the final goal. Finally, the algorithm is verified, the experimental results show that the proposed algorithm can improve the sample number and coverage under limited budget constraints, and improve the quality of the best sample set.

目前以用户参与为主体的感知系统会带来消耗用户时间、精力和参与成本等问题。因此，对用户参与给予合理的反馈和鼓励本身就可以有效地提高用户的主动性和数据质量。结合数据量、数据分布和预算约束，提出了一种基于参与式感知系统结构的逆向拍卖激励机制改进方案。首先，在任务提供者预算有限的情况下，以最大覆盖率和样本数量为优化目标，设计了结合动态反向拍卖激励策略的模型;其次，在优化样本筛选结果的基础上，提出了基于位置信息的改进算法KDA激励机制。该算法结合贪心算法逐步分解子问题优化的思路，以保证优化结果更接近最终目标。最后对算法进行了验证，实验结果表明，在有限的预算约束下，提出的算法能够提高样本数量和覆盖范围，提高最佳样本集的质量。

{"title":"Research on Incentive Algorithm of Participatory Sensing System Based on Location","authors":"Ziyi Qi, Mingxin Liu, Yanju Liang, Jing Chen","doi":"10.1109/ICBK.2019.00035","DOIUrl":"https://doi.org/10.1109/ICBK.2019.00035","url":null,"abstract":"At present, user participation as the main body of the perception system will bring the problems that include consuming user's time, energy and participation costs, and so on. Therefore, giving reasonable feedback and encouragement to user participation itself can effectively improve user's initiative and data quality. Combining data quantity, data distribution and budget constraint together, an improved incentive mechanism of reverse auction is proposed based on the structure of participatory sensing system in this paper. Firstly, to maximize the coverage rate and the number of samples as the optimization goal, a model combining the dynamic reverse auction incentive strategy is designed based on the limited budget of the task provider. Secondly, on the basis of optimizing the results of sample screening, the improved algorithm KDA incentive mechanism based on position information is proposed. The algorithm combines the greedy algorithm to gradually decompose the idea of subproblem optimization, in order to ensure that the optimization results are closer to the final goal. Finally, the algorithm is verified, the experimental results show that the proposed algorithm can improve the sample number and coverage under limited budget constraints, and improve the quality of the best sample set.","PeriodicalId":383917,"journal":{"name":"2019 IEEE International Conference on Big Knowledge (ICBK)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129504530","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Matrix Profile XX: Finding and Visualizing Time Series Motifs of All Lengths using the Matrix Profile 矩阵配置文件XX:使用矩阵配置文件查找和可视化所有长度的时间序列图案

2019 IEEE International Conference on Big Knowledge (ICBK)

Pub Date : 2019-11-01 DOI: 10.1109/ICBK.2019.00031

Frank Madrid, Shima Imani, Ryan Mercer, Zachary Schall-Zimmerman, N. S. Senobari, Eamonn J. Keogh

Many time series analytic tasks can be reduced to discovering and then reasoning about conserved structures, or time series motifs. Recently, the Matrix Profile has emerged as the state-of-the-art for finding time series motifs, allowing the community to efficiently find time series motifs in large datasets. The matrix profile reduced time series motif discovery to a process requiring a single parameter, the length of time series motifs we expect (or wish) to find. In many cases this is a reasonable limitation as the user may utilize out-of-band information or domain knowledge to set this parameter. However, in truly exploratory data mining, a poor choice of this parameter can result in failing to find unexpected and exploitable regularities in the data. In this work, we introduce the Pan Matrix Profile, a new data structure which contains the nearest neighbor information for all subsequences of all lengths. This data structure allows the first truly parameter-free motif discovery algorithm in the literature. The sheer volume of information produced by our representation may be overwhelming; thus, we also introduce a novel visualization tool called the motif-heatmap which allows the users to discover and reason about repeated structures at a glance. We demonstrate our ideas on a diverse set of domains including seismology, bioinformatics, transportation and biology.

许多时间序列分析任务可以简化为发现和推理保守结构或时间序列基序。最近，Matrix Profile已经成为寻找时间序列基序的最先进技术，使社区能够有效地在大型数据集中找到时间序列基序。矩阵轮廓将时间序列基序发现简化为需要单个参数的过程，即我们期望(或希望)找到的时间序列基序的长度。在许多情况下，这是一个合理的限制，因为用户可以利用带外信息或领域知识来设置此参数。然而，在真正的探索性数据挖掘中，该参数的选择不当可能导致无法在数据中发现意想不到的和可利用的规律。在这项工作中，我们引入了Pan Matrix Profile，这是一种新的数据结构，它包含所有长度的所有子序列的最近邻信息。这种数据结构使得文献中第一个真正的无参数基序发现算法成为可能。我们的代表所产生的信息量可能是压倒性的;因此，我们还引入了一种新的可视化工具，称为主题热图，它允许用户一眼发现和推理重复的结构。我们在不同的领域展示了我们的想法，包括地震学、生物信息学、交通和生物学。

{"title":"Matrix Profile XX: Finding and Visualizing Time Series Motifs of All Lengths using the Matrix Profile","authors":"Frank Madrid, Shima Imani, Ryan Mercer, Zachary Schall-Zimmerman, N. S. Senobari, Eamonn J. Keogh","doi":"10.1109/ICBK.2019.00031","DOIUrl":"https://doi.org/10.1109/ICBK.2019.00031","url":null,"abstract":"Many time series analytic tasks can be reduced to discovering and then reasoning about conserved structures, or time series motifs. Recently, the Matrix Profile has emerged as the state-of-the-art for finding time series motifs, allowing the community to efficiently find time series motifs in large datasets. The matrix profile reduced time series motif discovery to a process requiring a single parameter, the length of time series motifs we expect (or wish) to find. In many cases this is a reasonable limitation as the user may utilize out-of-band information or domain knowledge to set this parameter. However, in truly exploratory data mining, a poor choice of this parameter can result in failing to find unexpected and exploitable regularities in the data. In this work, we introduce the Pan Matrix Profile, a new data structure which contains the nearest neighbor information for all subsequences of all lengths. This data structure allows the first truly parameter-free motif discovery algorithm in the literature. The sheer volume of information produced by our representation may be overwhelming; thus, we also introduce a novel visualization tool called the motif-heatmap which allows the users to discover and reason about repeated structures at a glance. We demonstrate our ideas on a diverse set of domains including seismology, bioinformatics, transportation and biology.","PeriodicalId":383917,"journal":{"name":"2019 IEEE International Conference on Big Knowledge (ICBK)","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115218481","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 31

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2019 IEEE International Conference on Big Knowledge (ICBK)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀