车间管理最新文献

英文中文

Session details: Regular Paper Session II 会议详情:常规论文会议II

车间管理

Pub Date : 2015-10-18 DOI: 10.1145/3257875

Mouna Kacimi

引用次数: 0

Session details: Keynote Address 会议详情:主题演讲

车间管理

Pub Date : 2015-10-18 DOI: 10.1145/3257873

Mouna Kacimi

引用次数: 0

Efficient Top-k Query Answering through its Top-N Rewritings Using Views 利用视图重写Top-N的Top-k查询的高效应答

车间管理

Pub Date : 2015-10-18 DOI: 10.1145/2809890.2809895

Wissem Labbadi, J. Akaichi

Recently, various algorithms were proposed to speed up top-k query answering by using multiple materialized query results. Nevertheless, for most of the proposed algorithms, a potentially costly view selection operation is required. In fact, the processing cost has been shown to be linear with respect to the number of views and can be exorbitant given the large number of views to be considered. In this paper, we address the problem of identifying the top-N promising views to use for top-k query answering in the presence of a collection of views. We propose a novel algorithm, for handling this problem, which aims to achieve significant reduction in query execution time. Indeed, it considers minimal amount of rewritings that are likely necessary to return the top-k tuples for a top-k query. We consider, also, the problem of how to efficiently exploit the output of the rewritings algorithm to retrieve the top-k tuples through two possible solutions. The results of a thorough experimental study indicate that the proposed algorithm offers a robust solution to the problem of efficient top-k query answering using views since it discards non-promising query rewritings from the view selection process.

近年来，人们提出了多种利用多个实体化查询结果来加速top-k查询应答的算法。然而，对于大多数提出的算法，需要一个潜在的昂贵的视图选择操作。事实上，处理成本与视图的数量呈线性关系，并且在考虑大量视图的情况下，处理成本可能过高。在本文中，我们解决了在存在视图集合的情况下，识别用于top-k查询应答的top-N有希望的视图的问题。我们提出了一种新的算法来处理这个问题，旨在显著减少查询的执行时间。实际上，对于top-k查询，它考虑了可能返回top-k元组所需的最小重写量。我们还考虑了如何有效地利用重写算法的输出，通过两种可能的解来检索top-k元组的问题。一项全面的实验研究结果表明，由于该算法从视图选择过程中丢弃了不希望的查询重写，因此该算法为使用视图进行高效top-k查询应答问题提供了一个鲁棒的解决方案。

{"title":"Efficient Top-k Query Answering through its Top-N Rewritings Using Views","authors":"Wissem Labbadi, J. Akaichi","doi":"10.1145/2809890.2809895","DOIUrl":"https://doi.org/10.1145/2809890.2809895","url":null,"abstract":"Recently, various algorithms were proposed to speed up top-k query answering by using multiple materialized query results. Nevertheless, for most of the proposed algorithms, a potentially costly view selection operation is required. In fact, the processing cost has been shown to be linear with respect to the number of views and can be exorbitant given the large number of views to be considered. In this paper, we address the problem of identifying the top-N promising views to use for top-k query answering in the presence of a collection of views. We propose a novel algorithm, for handling this problem, which aims to achieve significant reduction in query execution time. Indeed, it considers minimal amount of rewritings that are likely necessary to return the top-k tuples for a top-k query. We consider, also, the problem of how to efficiently exploit the output of the rewritings algorithm to retrieve the top-k tuples through two possible solutions. The results of a thorough experimental study indicate that the proposed algorithm offers a robust solution to the problem of efficient top-k query answering using views since it discards non-promising query rewritings from the view selection process.","PeriodicalId":67056,"journal":{"name":"车间管理","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89043799","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Sparse Kernel Clustering of Massive High-Dimensional Data sets with Large Number of Clusters 具有大量聚类的海量高维数据集的稀疏核聚类

车间管理

Pub Date : 2015-10-18 DOI: 10.1145/2809890.2809896

Radha Chitta, Anil K. Jain, Rong Jin

In clustering applications involving documents and images, in addition to the large number of data points (N) and their high dimensionality (d), the number of clusters (C) into which the data need to be partitioned is also large. Kernel-based clustering algorithms, which have been shown to perform better than linear clustering algorithms, have high running time complexity in terms of N, d and C. We propose an efficient sparse kernel k-means clustering algorithm, which incrementally samples the most informative points from the data set using importance sampling, and constructs a sparse kernel matrix using these sampled points. Each row in this matrix corresponds to a data point's similarity with its p-nearest neighbors among the sampled points (p -- N). This sparse kernel matrix is used to perform clustering and obtain the cluster labels. This combination of sampling and sparsity reduces both the running time and memory complexity of kernel clustering. In order to further enhance its efficiency, the proposed algorithm projects the data on to the top C eigenvectors of the sparse kernel matrix and clusters these eigenvectors using a modified k-means algorithm. The running time of the proposed sparse kernel k-means algorithm is linear in N and d, and logarithmic in C. We show analytically that only a small number of points need to be sampled from the data set, and the resulting approximation error is well-bounded. We demonstrate, using several large high-dimensional text and image data sets, that the proposed algorithm is significantly faster than classical kernel-based clustering algorithms, while maintaining clustering quality.

在涉及文档和图像的聚类应用中，除了大量的数据点(N)和它们的高维数(d)外，需要将数据划分到的聚类数量(C)也很大。基于核的聚类算法比线性聚类算法表现得更好，但在N、d和c方面具有较高的运行时间复杂度。我们提出了一种高效的稀疏核k-means聚类算法，该算法使用重要性采样从数据集中增量采样最具信息量的点，并使用这些采样点构建一个稀疏核矩阵。该矩阵中的每一行对应于一个数据点与其采样点中p-近邻的相似性(p—N)。该稀疏核矩阵用于执行聚类并获得聚类标签。这种采样和稀疏性的结合减少了内核集群的运行时间和内存复杂度。为了进一步提高算法的效率，该算法将数据投影到稀疏核矩阵的前C个特征向量上，并使用改进的k-means算法对这些特征向量进行聚类。所提出的稀疏核k-means算法的运行时间在N和d上是线性的，在c上是对数的。我们分析地表明，只需要从数据集中采样少量的点，并且得到的近似误差是有界的。我们使用几个大型高维文本和图像数据集证明，该算法在保持聚类质量的同时，明显快于经典的基于核的聚类算法。

{"title":"Sparse Kernel Clustering of Massive High-Dimensional Data sets with Large Number of Clusters","authors":"Radha Chitta, Anil K. Jain, Rong Jin","doi":"10.1145/2809890.2809896","DOIUrl":"https://doi.org/10.1145/2809890.2809896","url":null,"abstract":"In clustering applications involving documents and images, in addition to the large number of data points (N) and their high dimensionality (d), the number of clusters (C) into which the data need to be partitioned is also large. Kernel-based clustering algorithms, which have been shown to perform better than linear clustering algorithms, have high running time complexity in terms of N, d and C. We propose an efficient sparse kernel k-means clustering algorithm, which incrementally samples the most informative points from the data set using importance sampling, and constructs a sparse kernel matrix using these sampled points. Each row in this matrix corresponds to a data point's similarity with its p-nearest neighbors among the sampled points (p -- N). This sparse kernel matrix is used to perform clustering and obtain the cluster labels. This combination of sampling and sparsity reduces both the running time and memory complexity of kernel clustering. In order to further enhance its efficiency, the proposed algorithm projects the data on to the top C eigenvectors of the sparse kernel matrix and clusters these eigenvectors using a modified k-means algorithm. The running time of the proposed sparse kernel k-means algorithm is linear in N and d, and logarithmic in C. We show analytically that only a small number of points need to be sampled from the data set, and the resulting approximation error is well-bounded. We demonstrate, using several large high-dimensional text and image data sets, that the proposed algorithm is significantly faster than classical kernel-based clustering algorithms, while maintaining clustering quality.","PeriodicalId":67056,"journal":{"name":"车间管理","volume":"14 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82773701","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Proceedings of the 8th Workshop on Ph.D. Workshop in Information and Knowledge Management 第八届信息与知识管理博士研讨会论文集

车间管理

Pub Date : 2015-10-18 DOI: 10.1145/2809890

Mouna Kacimi, N. Preda, Maya Ramanath

The publication date is one day earlier then the EST date to provide the proceedings to attendees in Australian on the first day of the conference It is our pleasure to host PIKM, the PhD workshop in Information and Knowledge Management, in conjunction with the ACM CIKM 2015 conference in Melbourne, Australia. PIKM has been a mpopular event in CIKM since its inception in 2007. This is the 8th time PIKM is being held and has attracted participants from all over the world. PIKM provides PhD students an opportunity to present their dissertation proposals and/or early doctoral research worldwide and get recognition for their work. It gives them valuable feedback at a relatively early stage from experts in their field in academia and industry. This helps them assess their work with respect to its novelty, technical contributions and real-world applications. Moreover, PIKM also presents a panorama of upcoming doctoral work to established researchers in information and knowledge management. It gives them an idea of the interesting topics that attract fresh doctorates. It could help them tap this potential at an early stage through summer internships, research collaborations and more. There have been 16 submissions to PIKM2015 of which 5 have been accepted as full papers. A significant highlight of PIKM 2015 includes both poster and oral presentations for all accepted papers to increase visibility and interaction. Another distinguished aspect this year is a career development session consisting of a mentoring presentation, from an experienced researcher, which emphasizes the importance of seeking opportunities and developing the needed skills to be successful after the PhD. We encourage participants to attend the keynote and invited talks. These valuable and insightful talks can help PhD students in their career: Keynote: "Why Researchers are Managers", Dr. Gerard de Melo (Tsinghua University, China) Invited talk in the career development session: "Beyond The Thesis: Completing A Successful PhD", Prof. Justin Zobel (University of Melbourne, Australia) The PIKM 2015 team includes Program Committee members from 11 countries spanning 4 continents. These comprise a good balance of industry and academia. We thank the reviewers for providing quick and useful feedback to the students amidst their busy schedule of work. In recent years, PIKM has been giving a best reviewer award in order to honor the exceptional contributions of a PC member, analogous to the best paper award that provides recognition to outstanding PhD student research. This year, the best paper award goes to Shady Elbassuoni from the American University of Beirut, Lebanon. We sincerely applaud him for his time and effort in providing excellent and detailed reviews. The best paper award will be announced during the PIKM workshop at the CIKM conference. Both these awards consist of ACM certificates.

发布日期比EST日期早一天，以便在会议的第一天向澳大利亚的与会者提供会议记录。我们很高兴主办PIKM，信息和知识管理博士研讨会，与ACM CIKM 2015会议一起在澳大利亚墨尔本举行。自2007年成立以来，PIKM一直是CIKM的热门活动。这是PIKM第八次举办，吸引了来自世界各地的参与者。PIKM为博士生提供了一个在全球范围内展示他们的论文提案和/或早期博士研究的机会，并为他们的工作获得认可。在相对较早的阶段，它可以从学术界和工业界的专家那里获得有价值的反馈。这有助于他们根据其新颖性、技术贡献和实际应用来评估他们的工作。此外，PIKM还向信息和知识管理领域的知名研究人员展示了即将开展的博士工作的全景。它让他们了解吸引新博士的有趣话题。它可以通过暑期实习、研究合作等方式，帮助他们在早期阶段挖掘这一潜力。共有16篇论文投稿至PIKM2015，其中5篇已被录用为论文全文。PIKM 2015的一个重要亮点包括所有被接受的论文的海报和口头报告，以增加可见度和互动。今年的另一个亮点是职业发展环节，其中包括一位经验丰富的研究人员的指导演讲，强调寻求机会和培养博士毕业后成功所需技能的重要性。我们鼓励与会者参加主题演讲和特邀演讲。主题演讲:“为什么研究人员是管理者”，Gerard de Melo博士(中国清华大学)职业发展专题演讲:“超越论文:完成一个成功的博士”，Justin Zobel教授(澳大利亚墨尔本大学)2015年PIKM团队包括来自4大洲11个国家的项目委员会成员。这包括工业界和学术界的良好平衡。我们感谢审稿人在繁忙的工作中为学生提供了快速而有用的反馈。近年来，PIKM一直在颁发最佳审稿人奖，以表彰PC成员的杰出贡献，类似于授予杰出博士生研究的最佳论文奖。今年，最佳论文奖颁给了黎巴嫩贝鲁特美国大学的Shady Elbassuoni。我们真诚地赞扬他花时间和精力提供优秀和详细的评论。最佳论文奖将在CIKM会议的PIKM研讨会上公布。这两个奖项都包括ACM证书。

{"title":"Proceedings of the 8th Workshop on Ph.D. Workshop in Information and Knowledge Management","authors":"Mouna Kacimi, N. Preda, Maya Ramanath","doi":"10.1145/2809890","DOIUrl":"https://doi.org/10.1145/2809890","url":null,"abstract":"The publication date is one day earlier then the EST date to provide the proceedings to attendees in Australian on the first day of the conference \u0000 \u0000It is our pleasure to host PIKM, the PhD workshop in Information and Knowledge Management, in conjunction with the ACM CIKM 2015 conference in Melbourne, Australia. PIKM has been a mpopular event in CIKM since its inception in 2007. This is the 8th time PIKM is being held and has attracted participants from all over the world. \u0000 \u0000PIKM provides PhD students an opportunity to present their dissertation proposals and/or early doctoral research worldwide and get recognition for their work. It gives them valuable feedback at a relatively early stage from experts in their field in academia and industry. This helps them assess their work with respect to its novelty, technical contributions and real-world applications. Moreover, PIKM also presents a panorama of upcoming doctoral work to established researchers in information and knowledge management. It gives them an idea of the interesting topics that attract fresh doctorates. It could help them tap this potential at an early stage through summer internships, research collaborations and more. \u0000 \u0000There have been 16 submissions to PIKM2015 of which 5 have been accepted as full papers. A significant highlight of PIKM 2015 includes both poster and oral presentations for all accepted papers to increase visibility and interaction. Another distinguished aspect this year is a career development session consisting of a mentoring presentation, from an experienced researcher, which emphasizes the importance of seeking opportunities and developing the needed skills to be successful after the PhD. We encourage participants to attend the keynote and invited talks. These valuable and insightful talks can help PhD students in their career: \u0000Keynote: \"Why Researchers are Managers\", Dr. Gerard de Melo (Tsinghua University, China) \u0000Invited talk in the career development session: \"Beyond The Thesis: Completing A Successful PhD\", Prof. Justin Zobel (University of Melbourne, Australia) \u0000 \u0000 \u0000 \u0000The PIKM 2015 team includes Program Committee members from 11 countries spanning 4 continents. These comprise a good balance of industry and academia. We thank the reviewers for providing quick and useful feedback to the students amidst their busy schedule of work. In recent years, PIKM has been giving a best reviewer award in order to honor the exceptional contributions of a PC member, analogous to the best paper award that provides recognition to outstanding PhD student research. This year, the best paper award goes to Shady Elbassuoni from the American University of Beirut, Lebanon. We sincerely applaud him for his time and effort in providing excellent and detailed reviews. The best paper award will be announced during the PIKM workshop at the CIKM conference. Both these awards consist of ACM certificates.","PeriodicalId":67056,"journal":{"name":"车间管理","volume":"10 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78451273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Generative Model For Time Series Discretization Based On Multiple Normal Distributions 基于多正态分布的时间序列离散生成模型

车间管理

Pub Date : 2015-10-18 DOI: 10.1145/2809890.2809892

S. Gandhi, T. Oates, Arnold P. Boedihardjo, Crystal Chen, Jessica Lin, Pavel Senin, S. Frankenstein, Xing Wang

Discretization is a crucial first step in several time series mining applications. Our research proposes a novel method to discretize time series data and develops a similarity score based on the discretized representation. The similarity score allows us to compare two time series sequences and enables us to perform pattern learning tasks such as clustering, classification, and anomaly detection. We propose a generative model for discretization based on multiple normal distributions and create an optimization technique to learn parameters of these normal distributions. To show the effectiveness of our approach, we perform comprehensive experiments in classifying datasets from the UCR time series repository.

离散化是几个时间序列挖掘应用中至关重要的第一步。我们的研究提出了一种新的离散化时间序列数据的方法，并基于离散化表示建立了相似度评分。相似性分数允许我们比较两个时间序列序列，并使我们能够执行模式学习任务，如聚类、分类和异常检测。我们提出了一种基于多个正态分布的离散化生成模型，并创建了一种优化技术来学习这些正态分布的参数。为了证明我们方法的有效性，我们对UCR时间序列存储库中的数据集进行了全面的分类实验。

引用次数: 4

Session details: Career Development Session (Invited Talk) 会议详情:职业发展专场(特邀讲座)

车间管理

Pub Date : 2015-10-18 DOI: 10.1145/3257876

N. Preda

引用次数: 0

Topic Detection from Large Scale of Microblog Stream with High Utility Pattern Clustering 基于高效用模式聚类的大规模微博流话题检测

车间管理

Pub Date : 2015-10-18 DOI: 10.1145/2809890.2809894

Jiajia Huang, Min Peng, Hua Wang

With the popularity of social media, detecting topics from microblog streams have become an increasingly important task. However, it's a challenge due to microblog streams have the characteristics of high-dimension, short and noisy content, fast changing, huge volume and so on. In this paper, we propose a high utility pattern clustering (HUPC) framework over microblog streams. This framework first extracts a group of representative patterns from the microblog stream, and then groups these patterns into topic clusters. This approach works well on large scale of microblog streams because it clusters the patterns that perform better in describing topics, rather than clustering noises and microblogs directly. Furthermore, the proposed framework can detect coherent topics and new emerging topics simultaneously. Extensive experimental results on Twitter streams and Sina Weibo streams show that the developed method achieves better performance than other existing topic detection methods, leading to a desirable solution of detecting event from microblog streams.

随着社交媒体的普及，从微博流中检测话题已经成为一项越来越重要的任务。然而，由于微博流具有高维、内容短而杂、变化快、体量大等特点，这是一个挑战。本文提出了一种基于微博流的高效用模式聚类(HUPC)框架。该框架首先从微博流中提取一组有代表性的模式，然后将这些模式分组到主题集群中。这种方法在大规模的微博流上工作得很好，因为它聚类了在描述主题方面表现更好的模式，而不是直接聚类噪声和微博。此外，该框架还可以同时检测连贯主题和新出现的主题。在Twitter流和新浪微博流上的大量实验结果表明，所开发的方法比现有的其他话题检测方法具有更好的性能，为微博流事件检测提供了理想的解决方案。

引用次数: 35

Beyond The Thesis: Completing A Successful PhD 超越论文:成功完成博士学位

车间管理

Pub Date : 2015-10-18 DOI: 10.1145/2809890.2815473

J. Zobel

Many factors lead to students undertaking a PhD. A student may, for example, be intellectually curious, and want to pursue an interest or understand a problem; or may be adventurous, and want to make a significant discovery; or be entrepreneurial, and want to create an innovation; or want to work with a particular scientist; or want to continue to participate in life on campus. Students may regard a PhD as an opportunity to acquire deep training in research in the field, and perhaps to distinguish themselves by completing a piece of major work, acquiring the title of 'doctor', and becoming a scientist. Perhaps surprisingly, many students seem to give only limited attention to the details of what their next step will be, even at the end of the PhD. While they may have a general goal to become an academic or researcher, these students have not explored what is involved in reaching that goal. Yet the activities of the PhD, perhaps even in the first year, can help shape each student's career. In particular, students need to be aware of their need to develop skills, and acquire experience, in areas beyond that of the core activities of research. Students do use the PhD to develop themselves. At the start of their PhDs, students are highly diverse, with individual strengths and weaknesses. The task of completing the PhD to some extent normalizes these differences: students find that they have to address their shortcomings, while exploiting their existing skills as they build an initial body of research. However, this development tends to be focused on the skills need for the PhD itself - writing, speaking, managing data, analysis of literature, design of experiments, and so on. Yet a PhD is also an opportunity for students to develop more broadly, and to position themselves for the career of their choice. Some students do not take advantage of this opportunity, while others, in their haste to finish, sidestep some of the aspects of PhD study from which they have the most to learn. In particular, an aspect of PhD study that is often overlooked is that it can be a period of intense personal development. The demands of undertaking such a long, concentrated piece of work can lead to intellectual rigor, intellectual independence, systematic work habits, and, perhaps most crucially, deepened self-assessment. The most successful scientists are not just technically capable, imaginative, lucid, and so on, but are aware of their limitations. In some cases these can be rectified through discipline and study; in others, they are factors to consider when choosing or shaping a career. Thus an effective student should approach the end of the PhD in a strategic way, seeking opportunities to develop the qualities that will help give an easy transition to the next career step, while taking a clear-eyed view of the likelihood of success in different kinds of work.

许多因素导致学生攻读博士学位。例如，一个学生可能有求知欲，想要追求兴趣或理解一个问题;或者可能是冒险的，想要做出重大发现;或者是企业家，想要创造一种创新;或者想和某个科学家一起工作;还是想继续参与校园生活。学生们可能把博士学位看作是在该领域获得深入研究训练的机会，也可能是通过完成一项重要工作、获得“博士”头衔、成为一名科学家而脱颖而出的机会。也许令人惊讶的是，许多学生似乎对他们下一步要做什么的细节只给予有限的关注，即使在博士学位结束时也是如此。虽然他们可能有一个成为学者或研究人员的总体目标，但这些学生并没有探索实现这一目标所涉及的内容。然而，博士学位的活动，甚至可能在第一年，就可以帮助塑造每个学生的职业生涯。特别是，学生需要意识到他们需要在核心研究活动之外的领域发展技能和获得经验。学生们确实利用博士学位来发展自己。在攻读博士学位之初，学生是高度多样化的，有各自的长处和短处。完成博士学位的任务在某种程度上使这些差异正常化:学生们发现他们必须解决自己的缺点，同时利用他们现有的技能，因为他们建立了一个初步的研究体系。然而，这种发展倾向于集中在博士学位本身所需的技能上——写作、演讲、数据管理、文献分析、实验设计等等。然而，博士学位也为学生提供了一个更广泛发展的机会，并为他们选择的职业定位。一些学生没有利用这个机会，而另一些学生在急于完成博士学业的过程中，回避了他们最需要学习的一些方面。特别是，博士学习的一个经常被忽视的方面是，它可以是一个强烈的个人发展时期。承担这样一项长时间、集中的工作的要求可能会导致智力上的严谨、智力上的独立、系统的工作习惯，也许最重要的是，加深自我评估。最成功的科学家不仅在技术上有能力、有想象力、头脑清醒等等，而且知道自己的局限性。在某些情况下，可以通过纪律和学习加以纠正;在其他情况下，它们是选择或塑造职业时要考虑的因素。因此，一个高效的学生应该以一种战略性的方式接近博士学位的结束，寻找机会培养有助于轻松过渡到下一个职业阶段的素质，同时对不同类型工作中成功的可能性有一个清晰的看法。

{"title":"Beyond The Thesis: Completing A Successful PhD","authors":"J. Zobel","doi":"10.1145/2809890.2815473","DOIUrl":"https://doi.org/10.1145/2809890.2815473","url":null,"abstract":"Many factors lead to students undertaking a PhD. A student may, for example, be intellectually curious, and want to pursue an interest or understand a problem; or may be adventurous, and want to make a significant discovery; or be entrepreneurial, and want to create an innovation; or want to work with a particular scientist; or want to continue to participate in life on campus. Students may regard a PhD as an opportunity to acquire deep training in research in the field, and perhaps to distinguish themselves by completing a piece of major work, acquiring the title of 'doctor', and becoming a scientist. Perhaps surprisingly, many students seem to give only limited attention to the details of what their next step will be, even at the end of the PhD. While they may have a general goal to become an academic or researcher, these students have not explored what is involved in reaching that goal. Yet the activities of the PhD, perhaps even in the first year, can help shape each student's career. In particular, students need to be aware of their need to develop skills, and acquire experience, in areas beyond that of the core activities of research. Students do use the PhD to develop themselves. At the start of their PhDs, students are highly diverse, with individual strengths and weaknesses. The task of completing the PhD to some extent normalizes these differences: students find that they have to address their shortcomings, while exploiting their existing skills as they build an initial body of research. However, this development tends to be focused on the skills need for the PhD itself - writing, speaking, managing data, analysis of literature, design of experiments, and so on. Yet a PhD is also an opportunity for students to develop more broadly, and to position themselves for the career of their choice. Some students do not take advantage of this opportunity, while others, in their haste to finish, sidestep some of the aspects of PhD study from which they have the most to learn. In particular, an aspect of PhD study that is often overlooked is that it can be a period of intense personal development. The demands of undertaking such a long, concentrated piece of work can lead to intellectual rigor, intellectual independence, systematic work habits, and, perhaps most crucially, deepened self-assessment. The most successful scientists are not just technically capable, imaginative, lucid, and so on, but are aware of their limitations. In some cases these can be rectified through discipline and study; in others, they are factors to consider when choosing or shaping a career. Thus an effective student should approach the end of the PhD in a strategic way, seeking opportunities to develop the qualities that will help give an easy transition to the next career step, while taking a clear-eyed view of the likelihood of success in different kinds of work.","PeriodicalId":67056,"journal":{"name":"车间管理","volume":"49 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91260879","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

R-Apriori: An Efficient Apriori based Algorithm on Spark R-Apriori:一种基于Spark的高效Apriori算法

车间管理

Pub Date : 2015-10-18 DOI: 10.1145/2809890.2809893

Sanjay Rathee, Manohar Kaul, Arti Kashyap

Association rule mining remains a very popular and effective method to extract meaningful information from large datasets. It tries to find possible associations between items in large transaction based datasets. In order to create these associations, frequent patterns have to be generated. The "Apriori" algorithm along with its set of improved variants, which were one of the earliest proposed frequent pattern generation algorithms still remain a preferred choice due to their ease of implementation and natural tendency to be parallelized. While many efficient single-machine methods for Apriori exist, the massive amount of data available these days is far beyond the capacity of a single machine. Hence, there is a need to scale across multiple machines to meet the demands of this ever-growing data. MapReduce is a popular fault-tolerant framework for distributed applications. Nevertheless, heavy disk I/O at each MapReduce operation hinders the implementation of efficient iterative data mining algorithms, such as Apriori, on MapReduce platforms. A newly proposed in-memory distributed dataflow platform called Spark overcomes the disk I/O bottlenecks in MapReduce. Therefore, Spark presents an ideal platform for distributed Apriori. However, in the implementation of Apriori, the most computationally expensive task is the generation of candidate sets having all possible pairs for singleton frequent items and comparing each pair with every transaction record. Here, we propose a new approach which dramatically reduces this computational complexity by eliminating the candidate generation step and avoiding costly comparisons. We conduct in-depth experiments to gain insight into the effectiveness, efficiency and scalability of our approach. Our studies show that our approach outperforms the classical Apriori and state-of-the-art on Spark by many times for different datasets.

关联规则挖掘是从大型数据集中提取有意义信息的一种非常流行和有效的方法。它试图在基于大型事务的数据集中找到项目之间可能的关联。为了创建这些关联，必须生成频繁的模式。“Apriori”算法及其改进的变体集是最早提出的频繁模式生成算法之一，由于其易于实现和自然的并行化倾向，仍然是首选。虽然存在许多高效的Apriori单机器方法，但目前可用的大量数据远远超出了单机器的容量。因此，需要跨多台机器进行扩展，以满足不断增长的数据的需求。MapReduce是一个流行的分布式应用容错框架。然而，每次MapReduce操作时繁重的磁盘I/O阻碍了MapReduce平台上高效迭代数据挖掘算法(如Apriori)的实现。新提出的内存分布式数据流平台Spark克服了MapReduce的磁盘I/O瓶颈。因此，Spark为分布式Apriori提供了理想的平台。然而，在Apriori的实现中，计算成本最高的任务是为单例频繁项生成具有所有可能对的候选集，并将每个对与每个事务记录进行比较。在这里，我们提出了一种新的方法，通过消除候选生成步骤和避免昂贵的比较，大大降低了计算复杂度。我们进行了深入的实验，以深入了解我们方法的有效性、效率和可扩展性。我们的研究表明，对于不同的数据集，我们的方法比Spark上的经典Apriori和最先进的方法要好很多倍。

{"title":"R-Apriori: An Efficient Apriori based Algorithm on Spark","authors":"Sanjay Rathee, Manohar Kaul, Arti Kashyap","doi":"10.1145/2809890.2809893","DOIUrl":"https://doi.org/10.1145/2809890.2809893","url":null,"abstract":"Association rule mining remains a very popular and effective method to extract meaningful information from large datasets. It tries to find possible associations between items in large transaction based datasets. In order to create these associations, frequent patterns have to be generated. The \"Apriori\" algorithm along with its set of improved variants, which were one of the earliest proposed frequent pattern generation algorithms still remain a preferred choice due to their ease of implementation and natural tendency to be parallelized. While many efficient single-machine methods for Apriori exist, the massive amount of data available these days is far beyond the capacity of a single machine. Hence, there is a need to scale across multiple machines to meet the demands of this ever-growing data. MapReduce is a popular fault-tolerant framework for distributed applications. Nevertheless, heavy disk I/O at each MapReduce operation hinders the implementation of efficient iterative data mining algorithms, such as Apriori, on MapReduce platforms. A newly proposed in-memory distributed dataflow platform called Spark overcomes the disk I/O bottlenecks in MapReduce. Therefore, Spark presents an ideal platform for distributed Apriori. However, in the implementation of Apriori, the most computationally expensive task is the generation of candidate sets having all possible pairs for singleton frequent items and comparing each pair with every transaction record. Here, we propose a new approach which dramatically reduces this computational complexity by eliminating the candidate generation step and avoiding costly comparisons. We conduct in-depth experiments to gain insight into the effectiveness, efficiency and scalability of our approach. Our studies show that our approach outperforms the classical Apriori and state-of-the-art on Spark by many times for different datasets.","PeriodicalId":67056,"journal":{"name":"车间管理","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75492012","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 71

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

车间管理

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀