2017 14th Web Information Systems and Applications Conference (WISA)最新文献

英文中文

What are the Factors Impacting Build Breakage? 影响构建破坏的因素是什么?

2017 14th Web Information Systems and Applications Conference (WISA)

Pub Date : 2017-11-01 DOI: 10.1109/WISA.2017.17

Yang Luo, Yangyang Zhao, Wanwangying Ma, Lin Chen

Continuous Integration (CI) has become a good practice of software development in recent years. As an essential part of CI, build creates software from source code. Predicting build outcome help developers to review and fix bugs before building to save time. However, we are missing objective evidence of practical factors affecting build result. Travis CI provides a hosted, distributed continuous integration service used to build and test software projects hosted at GitHub. The TravisTorrent is a dataset which deeply analyzes source code, process and dependency status of projects hosting on Travis CI. We use this dataset to investigate which factors may impact a build result. We first preprocess TravisTorrent data to extract 27 features. We then analyze the correlation between these features and the result of a build. Finally, we build four prediction models to predict the result of a build and perform a horizontal analysis. We found that in our study, the number of commits in a build (git_num_all_built_commits) is the most import factor that has significant impact on the build result, and SVM performs best in the four of the prediction models we used.

近年来，持续集成(CI)已成为软件开发的一种良好实践。作为CI的重要组成部分，构建从源代码创建软件。预测构建结果可以帮助开发人员在构建之前检查和修复错误，从而节省时间。然而，我们缺少影响构建结果的实际因素的客观证据。Travis CI提供了一个托管的、分布式的持续集成服务，用于构建和测试托管在GitHub上的软件项目。TravisTorrent是一个数据集，它深入分析了托管在Travis CI上的项目的源代码、过程和依赖状态。我们使用这个数据集来调查哪些因素可能会影响构建结果。我们首先对TravisTorrent数据进行预处理，提取27个特征。然后，我们分析这些特性与构建结果之间的相关性。最后，我们构建了四个预测模型来预测构建的结果并执行横向分析。我们发现，在我们的研究中，构建中的提交数量(git_num_all_built_commits)是对构建结果有重大影响的最重要因素，SVM在我们使用的四种预测模型中表现最好。

{"title":"What are the Factors Impacting Build Breakage?","authors":"Yang Luo, Yangyang Zhao, Wanwangying Ma, Lin Chen","doi":"10.1109/WISA.2017.17","DOIUrl":"https://doi.org/10.1109/WISA.2017.17","url":null,"abstract":"Continuous Integration (CI) has become a good practice of software development in recent years. As an essential part of CI, build creates software from source code. Predicting build outcome help developers to review and fix bugs before building to save time. However, we are missing objective evidence of practical factors affecting build result. Travis CI provides a hosted, distributed continuous integration service used to build and test software projects hosted at GitHub. The TravisTorrent is a dataset which deeply analyzes source code, process and dependency status of projects hosting on Travis CI. We use this dataset to investigate which factors may impact a build result. We first preprocess TravisTorrent data to extract 27 features. We then analyze the correlation between these features and the result of a build. Finally, we build four prediction models to predict the result of a build and perform a horizontal analysis. We found that in our study, the number of commits in a build (git_num_all_built_commits) is the most import factor that has significant impact on the build result, and SVM performs best in the four of the prediction models we used.","PeriodicalId":204706,"journal":{"name":"2017 14th Web Information Systems and Applications Conference (WISA)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122657061","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

Online Map Matching Algorithm Using Segment Angle Based on Hidden Markov Model 基于隐马尔可夫模型的分段角在线地图匹配算法

2017 14th Web Information Systems and Applications Conference (WISA)

Pub Date : 2017-11-01 DOI: 10.1109/WISA.2017.19

Jie Xu, Na Ta, Chunxiao Xing, Yong Zhang

The Global Positioning System(GPS) is used to find a specific point on the real earth although GPS positioning technology is becoming more and more mature, GPS always exists with equipment inherent errors or measurement methods errors. so map matching step is a very important preprocessing for lots of applications, such as traffic flow control, taxi mileage calculation, and finding some people. However, many current methods only deal with distance variables and do not handle angle variables between two segments. In this paper, we propose a new road network map matching algorithm, considering not only the distance between two sample points but also taking into account the angle between two candidate segments using the Hidden Markov Model (HMM) which is a popular solution for map matching. Subsequently, to solve the HMM problem, we make use of dynamic programming Viterbi algorithm to find the maximum probability road segments. The experiments are implemented on BEIJING CITY map real dataset and display that our map matching algorithm significantly improve the accuracy compared with ST-Matching global algorithm.

全球定位系统(GPS)是用来寻找真实地球上的特定点的，虽然GPS定位技术越来越成熟，但GPS始终存在设备固有误差或测量方法误差。因此，地图匹配步骤在交通流量控制、出租车里程计算、寻人等应用中都是一个非常重要的预处理步骤。然而，目前的许多方法只处理距离变量，而不处理两个段之间的角度变量。本文提出了一种新的路网地图匹配算法，该算法不仅考虑了两个样本点之间的距离，而且利用隐马尔可夫模型(HMM)考虑了两个候选路段之间的角度，隐马尔可夫模型是一种流行的地图匹配方法。随后，为了解决HMM问题，我们利用动态规划Viterbi算法寻找概率最大的路段。在北京城市地图真实数据集上进行了实验，实验结果表明，与ST-Matching全局匹配算法相比，我们的地图匹配算法的精度得到了显著提高。

引用次数: 1

A Domain-Independent Multi-modifier Entity Search Method 一种领域无关的多修饰符实体搜索方法

2017 14th Web Information Systems and Applications Conference (WISA)

Pub Date : 2017-11-01 DOI: 10.1109/WISA.2017.41

Huan Liao, Yukun Li, Gang Hao, Dexin Zhao, Yongxuan Lai, Weiwei Wang

Entity search is a new search pattern that return related entities to users rather than amounts of web pages containing mass and messy information. It is also a challenging research topic because it is difficult to understand the meaning of users' input and identify the entities from the messy web pages. In this paper, we propose an entity search pattern based on online encyclopedias and define it as MMK search(Multi-modifier Search), which means the input text by people only includes one kernel concept and multiple modifiers. We propose a solution framework to solve this kind of search, and propose a method to identify expected entities based on well-utilized online encyclopedias. To evaluate the methods, we create an experimental data set and a baseline under the help of participants, the results verified the effectiveness of our methods.

实体搜索是一种新的搜索模式，它向用户返回相关的实体，而不是包含大量杂乱信息的大量网页。这也是一个具有挑战性的研究课题，因为很难理解用户输入的含义，并从杂乱的网页中识别实体。本文提出了一种基于在线百科全书的实体搜索模式，并将其定义为MMK搜索(Multi-modifier search)，即人们输入的文本只包含一个核心概念和多个修饰语。我们提出了一个解决这类搜索的解决方案框架，并提出了一种基于使用良好的在线百科全书识别期望实体的方法。为了评估这些方法，我们在参与者的帮助下创建了一个实验数据集和一个基线，结果验证了我们方法的有效性。

引用次数: 0

User-Driven Filtering and Ranking of Topical Datasets Based on Overall Data Quality 基于整体数据质量的用户驱动的主题数据集过滤和排序

2017 14th Web Information Systems and Applications Conference (WISA)

Pub Date : 2017-11-01 DOI: 10.1109/WISA.2017.24

Wenze Xia, Zhuoming Xu, Chengwang Mao

Finding relevant and high-quality data is the eternal needs for data consumers (i.e., users). Many open data portals have been providing users with simple ways of finding datasets on a particular topic (i.e., topical datasets), which are not a way of filtering and ranking topical datasets based on data quality. Despite the recent advances in the development and standardization of data quality models and vocabulary, there is a lack of systematic research on approaches and tools for user-driven data quality-based filtering and ranking of topical datasets. In this paper we address the problem of user-driven filtering and ranking of topical datasets based on the overall data quality of datasets by developing a generic software architecture and the corresponding approach, called ODQFiRD, for filtering and ranking topical datasets according to user-specified data quality assessment criteria. Additionally, we use our implemented prototype of ODQFiRD to conduct a case study experiment on the U.S. Government's open data portal. The prototype implementation and experimental results show that our proposed ODQFiRD is achievable and effective.

寻找相关的高质量数据是数据消费者(即用户)的永恒需求。许多开放数据门户已经为用户提供了查找特定主题数据集(即主题数据集)的简单方法，而不是基于数据质量过滤和排名主题数据集的方法。尽管最近在数据质量模型和词汇的开发和标准化方面取得了进展，但缺乏对用户驱动的基于数据质量的过滤和主题数据集排名的方法和工具的系统研究。在本文中，我们通过开发一种通用的软件架构和相应的方法(称为ODQFiRD)，根据用户指定的数据质量评估标准对主题数据集进行过滤和排名，解决了基于数据集整体数据质量的主题数据集的用户驱动过滤和排名问题。此外，我们使用我们实现的ODQFiRD原型在美国政府的开放数据门户网站上进行案例研究实验。原型实现和实验结果表明，我们提出的ODQFiRD是可行的和有效的。

{"title":"User-Driven Filtering and Ranking of Topical Datasets Based on Overall Data Quality","authors":"Wenze Xia, Zhuoming Xu, Chengwang Mao","doi":"10.1109/WISA.2017.24","DOIUrl":"https://doi.org/10.1109/WISA.2017.24","url":null,"abstract":"Finding relevant and high-quality data is the eternal needs for data consumers (i.e., users). Many open data portals have been providing users with simple ways of finding datasets on a particular topic (i.e., topical datasets), which are not a way of filtering and ranking topical datasets based on data quality. Despite the recent advances in the development and standardization of data quality models and vocabulary, there is a lack of systematic research on approaches and tools for user-driven data quality-based filtering and ranking of topical datasets. In this paper we address the problem of user-driven filtering and ranking of topical datasets based on the overall data quality of datasets by developing a generic software architecture and the corresponding approach, called ODQFiRD, for filtering and ranking topical datasets according to user-specified data quality assessment criteria. Additionally, we use our implemented prototype of ODQFiRD to conduct a case study experiment on the U.S. Government's open data portal. The prototype implementation and experimental results show that our proposed ODQFiRD is achievable and effective.","PeriodicalId":204706,"journal":{"name":"2017 14th Web Information Systems and Applications Conference (WISA)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126496391","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Knowledge Graph Construction Based on Judicial Data with Social Media 基于Social Media的司法数据知识图谱构建

2017 14th Web Information Systems and Applications Conference (WISA)

Pub Date : 2017-11-01 DOI: 10.1109/WISA.2017.46

Hao Lian, Zemin Qin, Tieke He, B. Luo

With the process of the information openness and the development of Internet technology, judicial data begin to enter the public view, and the carrier of that is the referee document, as referee document almost reflect all information of cases. Everyone is a social media content producer and consumer, which on behalf of public's opinion about the law, and it has made the legal significance not only limited to the professional field, but also includes social cognitive meanings. So, Digging into the relationship between professional legal meaning and social cognition has become an important issue. We use the knowledge graph to construct the relationship network between social media and law entities of professional legal data, and introduce the related methods of knowledge graph.

随着信息公开的进程和互联网技术的发展，司法数据开始进入公众视野，而公众视野的载体就是裁判文书，裁判文书几乎反映了案件的全部信息。每个人都是社交媒体内容的生产者和消费者，这代表了公众对法律的看法，这使得法律意义不仅局限于专业领域，还包含了社会认知意义。因此，深入研究职业法律意义与社会认知的关系就成为一个重要的课题。运用知识图谱构建了专业法律数据的社交媒体与法律实体之间的关系网络，并介绍了知识图谱的相关方法。

引用次数: 9

Topic Classification Based on Improved Word Embedding 基于改进词嵌入的主题分类

2017 14th Web Information Systems and Applications Conference (WISA)

Pub Date : 2017-11-01 DOI: 10.1109/WISA.2017.44

Liangliang Sheng, Lizhen Xu

Topic classification is a foundational task in many NLP applications. Traditional topic classifiers often rely on many humandesigned features, while word embedding and convolutional neural network based on deep learning are introduced to realize topic classification in recent years. In this paper, the influence of different word embedding for CNN classifiers is studied, and an improved word embedding named HybridWordVec is proposed, which is a combination of word2vec and topic distribution vector. Experiment on Chinese corpus Fudan set and English corpus 20Newsgroups is conducted. The experiment turns out that CNN with HybridWordVec gains an accuracy of 91.82% for Chinese corpus and 95.67% for English corpus, which suggests HybridWordVec can obviously improve the classification accuracy comparing with other word embedding models like word2vec and GloVe.

主题分类是许多自然语言处理应用的基础任务。传统的主题分类器往往依赖于许多人为设计的特征，而近年来引入了词嵌入和基于深度学习的卷积神经网络来实现主题分类。本文研究了不同的词嵌入对CNN分类器的影响，提出了一种将word2vec与主题分布向量相结合的改进词嵌入方法HybridWordVec。对汉语语料库复旦集和英语语料库新闻组进行了实验。实验结果表明，使用HybridWordVec的CNN对中文语料库的分类准确率为91.82%，对英文语料库的分类准确率为95.67%，这表明与word2vec、GloVe等其他词嵌入模型相比，HybridWordVec可以明显提高分类准确率。

引用次数: 4

Sentiment Tendency Analysis of THAAD Event in Indonesian News 印尼新闻对“萨德”事件的情绪倾向分析

2017 14th Web Information Systems and Applications Conference (WISA)

Pub Date : 2017-11-01 DOI: 10.1109/WISA.2017.48

Ye Liang, Bing Fu, Zongchun Li

The sentiment tendency analysis based on the news reports aims to discover the audience's attitude towards the hot event, which is an important research content of the emotion analysis. In the context of China's going-out strategy, we can effectively avoid the potential risks in the help of the in-depth study and interpretation of China's relevant policy in a certain country and region and the understanding of the local public opinion and the conditions of the people. As the official languages of those countries along the Belt and Road are mostly uncommon languages, we can not use the existing mature tools which use Chinese and English as the research object, so the sentiment tendency analysis to the uncommon languages is a very challenging task. On the basis of completing the basic work of Indonesian sentiment dictionary, degree adverb dictionary, negative word dictionary and stop word dictionary, this paper puts forward a set of calculating methods aiming at the sentiment tendencies in Indonesian language, which is applied to the calculation of the sentiment tendencies related to THAAD event in three major mainstream media in Indonesia, and the results of the calculation will be interpreted and analyzed.

基于新闻报道的情感倾向分析旨在发现受众对热点事件的态度，这是情感分析的重要研究内容。在中国走出去的战略背景下，深入研究和解读中国在某个国家和地区的相关政策，了解当地的民意和民情，可以有效地规避潜在的风险。由于“一带一路”沿线国家的官方语言多为非常见语言，我们无法使用现有的以汉语和英语为研究对象的成熟工具，因此对非常见语言的情感倾向分析是一项非常具有挑战性的任务。本文在完成印尼语情绪词典、程度副词词典、否定词词典、停止词词典等基础工作的基础上，提出了一套针对印尼语情绪倾向的计算方法，并将其应用于印尼三大主流媒体对“萨德”事件相关情绪倾向的计算，并对计算结果进行解释和分析。

{"title":"Sentiment Tendency Analysis of THAAD Event in Indonesian News","authors":"Ye Liang, Bing Fu, Zongchun Li","doi":"10.1109/WISA.2017.48","DOIUrl":"https://doi.org/10.1109/WISA.2017.48","url":null,"abstract":"The sentiment tendency analysis based on the news reports aims to discover the audience's attitude towards the hot event, which is an important research content of the emotion analysis. In the context of China's going-out strategy, we can effectively avoid the potential risks in the help of the in-depth study and interpretation of China's relevant policy in a certain country and region and the understanding of the local public opinion and the conditions of the people. As the official languages of those countries along the Belt and Road are mostly uncommon languages, we can not use the existing mature tools which use Chinese and English as the research object, so the sentiment tendency analysis to the uncommon languages is a very challenging task. On the basis of completing the basic work of Indonesian sentiment dictionary, degree adverb dictionary, negative word dictionary and stop word dictionary, this paper puts forward a set of calculating methods aiming at the sentiment tendencies in Indonesian language, which is applied to the calculation of the sentiment tendencies related to THAAD event in three major mainstream media in Indonesia, and the results of the calculation will be interpreted and analyzed.","PeriodicalId":204706,"journal":{"name":"2017 14th Web Information Systems and Applications Conference (WISA)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116148101","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Caching-Aware Techniques for Query Workload Partitioning in Parallel Search Engines 并行搜索引擎中查询工作负载分区的缓存感知技术

2017 14th Web Information Systems and Applications Conference (WISA)

Pub Date : 2017-11-01 DOI: 10.1109/WISA.2017.33

Chuanfei Xu, Yanqiu Wang, Pin Lv, Jia Xu

In this work, we propose efficient query workload partition techniques to reduce processing times of queries in parallel search engines. Existing methods cannot offer both high cache hit ratios and caching-aware load balance of the system. Aiming to solve this problem, we propose effective solutions to capture tradeoff between the cache hit ratio and load balance to reduce the total query processing time. The performance of the proposed algorithms are demonstrated by extensive experiments on real datasets, and the experimental results demonstrate that our algorithms have an efficiency improvement of up to at least 30% compared to extending current methods such as the roundrobin based algorithm and so on.

在这项工作中，我们提出了有效的查询工作负载分区技术，以减少并行搜索引擎中查询的处理时间。现有的方法不能同时提供高缓存命中率和缓存感知的系统负载平衡。为了解决这个问题，我们提出了有效的解决方案来捕获缓存命中率和负载平衡之间的权衡，以减少总查询处理时间。在实际数据集上进行了大量的实验，实验结果表明，与基于轮箱的算法等现有方法相比，我们的算法的效率提高了至少30%。

引用次数: 0

An XPath-Based Approach to Reusing Test Scripts for Android Applications 基于xpath的Android应用程序测试脚本重用方法

2017 14th Web Information Systems and Applications Conference (WISA)

Pub Date : 2017-11-01 DOI: 10.1109/WISA.2017.49

Fei Song, Zhuoming Xu, F. Xu

The version of an Android application (app) is updated frequently and rewriting test scripts for each version update is laborious and expensive, so reusing existing test scripts is a better choice. Although the app's business logic is relatively stable during the process of app version evolution, user interface (UI) control changes in the new version tend to cause the original test scripts to fail, which is the main problem in test script reuse. In this paper we address this problem by developing an XPath-based approach to reusing test scripts for Android apps in the case of changes in the locations, names, or property values of UI controls in the app. In our approach, the test scripts use XPath expressions to locate the UI controls. The approach first identifies failed test scripts and no longer valid XPath expressions by executing the original test scripts on the new version of the app. Next, it uses the invalid XPath expressions to find the difference between the two DOMs corresponding to a view in the changed page in the new version and a view in the original page in the previous version, respectively. Finally, it uses the DOM difference to repair the XPath expressions, thereby achieving the reuse of test scripts. We have implemented a prototype of the approach based on Robotium and used it to conduct experiments on two real-world Android apps. The results show that our approach can achieve a higher script reuse percent than Robotium.

Android应用程序(app)的版本经常更新，为每个版本更新重写测试脚本既费力又昂贵，因此重用现有的测试脚本是更好的选择。虽然在应用版本演变过程中，应用的业务逻辑相对稳定，但新版本中用户界面(UI)控制的变化往往会导致原始测试脚本失败，这是测试脚本重用的主要问题。在本文中，我们通过开发一种基于XPath的方法来解决这个问题，在应用程序中UI控件的位置、名称或属性值发生变化的情况下，为Android应用程序重用测试脚本。在我们的方法中，测试脚本使用XPath表达式来定位UI控件。该方法首先通过在新版本的应用程序上执行原始测试脚本来识别失败的测试脚本和不再有效的XPath表达式。接下来，它使用无效的XPath表达式来查找新版本中更改页面中的视图与前一版本中原始页面中的视图分别对应的两个dom之间的差异。最后，它使用DOM差异来修复XPath表达式，从而实现测试脚本的重用。我们已经基于Robotium实现了该方法的原型，并用它在两个真实的Android应用程序上进行实验。结果表明，我们的方法可以实现比Robotium更高的脚本重用率。

{"title":"An XPath-Based Approach to Reusing Test Scripts for Android Applications","authors":"Fei Song, Zhuoming Xu, F. Xu","doi":"10.1109/WISA.2017.49","DOIUrl":"https://doi.org/10.1109/WISA.2017.49","url":null,"abstract":"The version of an Android application (app) is updated frequently and rewriting test scripts for each version update is laborious and expensive, so reusing existing test scripts is a better choice. Although the app's business logic is relatively stable during the process of app version evolution, user interface (UI) control changes in the new version tend to cause the original test scripts to fail, which is the main problem in test script reuse. In this paper we address this problem by developing an XPath-based approach to reusing test scripts for Android apps in the case of changes in the locations, names, or property values of UI controls in the app. In our approach, the test scripts use XPath expressions to locate the UI controls. The approach first identifies failed test scripts and no longer valid XPath expressions by executing the original test scripts on the new version of the app. Next, it uses the invalid XPath expressions to find the difference between the two DOMs corresponding to a view in the changed page in the new version and a view in the original page in the previous version, respectively. Finally, it uses the DOM difference to repair the XPath expressions, thereby achieving the reuse of test scripts. We have implemented a prototype of the approach based on Robotium and used it to conduct experiments on two real-world Android apps. The results show that our approach can achieve a higher script reuse percent than Robotium.","PeriodicalId":204706,"journal":{"name":"2017 14th Web Information Systems and Applications Conference (WISA)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125817246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Mining Frequent Intra-Sequence and Inter-Sequence Patterns Using Bitmap with a Maximal Span 利用最大跨度位图挖掘频繁序列内和序列间模式

2017 14th Web Information Systems and Applications Conference (WISA)

Pub Date : 2017-11-01 DOI: 10.1109/WISA.2017.70

Wenzhe Liao, Qian Wang, Luqun Yang, Jiadong Ren, D. Davis, Changzhen Hu

Frequent intra-sequence pattern mining and inter-sequence pattern mining are both important ways of association rule mining for different applications. However, most algorithms focus on just one of them, as attempting both is usually inefficient. To address this deficiency, FIIP-BM, a Frequent Intra-sequence and Inter-sequence Pattern mining algorithm using Bitmap with a maxSpan is proposed. FIIP-BM transforms each transaction to a bit vector, adjusts the maximal span according to user's demand and obtains the frequent sequences by logic And-operation. For candidate 2-pattern generation, the subscripts of the joining items should be checked first; the bit vector of the joining item will be left-shifted before calculation if the subscript is not 0. Left alignment rule is used for different bit vector length problems. FIIP-BM can mine both intra-sequence and inter-sequence patterns. Experiments are conducted to demonstrate the computational speed and memory efficiency of the FIIP-BM algorithm.

频繁序列内模式挖掘和序列间模式挖掘都是针对不同应用进行关联规则挖掘的重要方法。然而，大多数算法只关注其中一个，因为同时尝试两者通常效率低下。为了解决这一缺陷，提出了一种基于位图和maxSpan的序列内和序列间频繁模式挖掘算法FIIP-BM。FIIP-BM将每个事务转换成一个位向量，根据用户需求调整最大跨度，通过逻辑运算得到频繁序列。对于候选2模式生成，首先要检查连接项的下标;如果下标不为0，则连接项的位向量将在计算前左移。左对齐规则适用于不同位向量长度的问题。FIIP-BM可以同时挖掘序列内和序列间的模式。实验验证了FIIP-BM算法的计算速度和存储效率。

引用次数: 3

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2017 14th Web Information Systems and Applications Conference (WISA)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀