International journal of database theory and application最新文献

英文中文

A Graph Theoretic Approach for the Identification of Objects Shape Taken from MPEG-7 Database MPEG-7数据库中物体形状识别的图论方法

International journal of database theory and application

Pub Date : 2017-03-31 DOI: 10.14257/IJDTA.2017.10.3.02

J. Pujari, J. Karur, K. Kale, V. Swamy

Objects never occur in isolation, instead, vary with other objects and in particular environment. In order to recognize the objects efficiently which are similar, there is a need for automating this problem. In this paper, we have proposed an approach to identify objects from MPEG-7 database consisting of 69 classes using graph theory. Graph parameters like graph eccentricity, graph diameter, graph radius and graph center values were used to form the feature vector. Back propagation neural network (BPNN) is used as a classifier. Features were reduced based on their performance in identification. Experimental results prove that an average identification accuracy of 91% is attained. The study is extended by combining other feature extraction techniques to train the neural network. This work finds its applications to train the robots in automobile industries to handle the objects.

对象永远不会孤立地出现，而是随着其他对象和特定环境的变化而变化。为了有效地识别相似物体，需要对该问题进行自动化处理。本文提出了一种基于图论的MPEG-7数据库中69个类的对象识别方法。利用图的偏心、图的直径、图的半径、图的中心值等图的参数构成特征向量。使用反向传播神经网络(BPNN)作为分类器。根据特征在识别中的表现进行特征缩减。实验结果表明，该方法的平均识别准确率达到91%。通过结合其他特征提取技术来训练神经网络，扩展了该研究。本研究可应用于汽车工业中机器人处理物体的训练。

引用次数: 0

Mining High-Utility Itemsets Based on Multiple Minimum Support and Multiple Minimum Utility Thresholds 基于多个最小支持度和多个最小效用阈值的高效用项集挖掘

International journal of database theory and application

Pub Date : 2017-03-31 DOI: 10.14257/IJDTA.2017.10.3.03

Fazla Elahe, Kun Zhang

Mining high utility itemsets from a transactional database refer to the discovery of high utility itemsets that generate high profit and several approaches have been proposed for this task in recent years. Algorithms like HUIM-MMU and MHU-Growth overcome the limitation of using a single threshold for the whole database. However, they still generate a large number of candidate itemsets and thus it degrades the performance of the algorithms. In this paper, we address this issue by combining two different kinds of thresholds used by HUIM-MMU and MHU-Growth. By using these two thresholds we propose two algorithms namely HUIM-MMSU and HUIM-IMMSU. HUIM-MMSU is a candidate generation and retest based algorithm, which relies on sorted downward closure (SDC) property. On the other hand, HUIM-IMMSU uses a tree-like data structure. Experiment result shows that the proposed two algorithms can effectively discover high utility itemsets from the transactional database.

从事务数据库中挖掘高效用项集是指发现产生高利润的高效用项集，近年来已经提出了几种方法。像HUIM-MMU和MHU-Growth这样的算法克服了对整个数据库使用单一阈值的限制。然而，它们仍然会产生大量的候选项集，从而降低了算法的性能。在本文中，我们通过结合HUIM-MMU和MHU-Growth使用的两种不同类型的阈值来解决这个问题。利用这两个阈值，我们提出了两种算法:HUIM-MMSU和HUIM-IMMSU。HUIM-MMSU是一种基于候选生成和重测的算法，它依赖于向下排序闭包(SDC)的特性。另一方面，HUIM-IMMSU使用树状数据结构。实验结果表明，这两种算法都能有效地从事务数据库中发现高效用项集。

引用次数: 0

The Application of TF-IDF with Time Factor in the Cluster of Micro-blog Theme 带时间因子的TF-IDF在微博主题聚类中的应用

International journal of database theory and application

Pub Date : 2017-02-28 DOI: 10.14257/ijdta.2017.10.2.03

Song Yu, Yangchen Wang, Tianchi Mo, Mingyan Liu, Hui Liu, Zhifang Liao

Time factor is of great significance for the topic clustering for Micro-blog. Usually, the topics discussed most frequently during a certain period may become the hot issues. Therefore, this article has successfully obtained the method of TF-IDF-TF by different division of periods and setting of different weights, then applied it to the ULPIR Microblog content corpus, with the hierarchical clustering method and k-means method being used to make statistic analysis. The result of the experiment shows that, compared with the traditional TF-IDF( term frequency- inverse document frequency ), the TF-IDF-TF method could provide more accurate clustering result, especially for specific topics during the period when users play most frequently.

时间因素对微博主题聚类具有重要意义。通常，某一时期讨论频率最高的话题就可能成为热点问题。因此，本文通过不同的周期划分和设置不同的权值，成功地获得了TF-IDF-TF的方法，并将其应用于ULPIR微博内容语料库中，采用层次聚类方法和k-means方法进行统计分析。实验结果表明，与传统的TF-IDF(term frequency- inverse document frequency)方法相比，TF-IDF- tf方法可以提供更准确的聚类结果，特别是对于用户最频繁播放时段的特定主题。

引用次数: 0

Comparative Analysis of Various Similarity Measures for Finding Similarity of Two Documents 两种文献相似度查找方法的比较分析

International journal of database theory and application

Pub Date : 2017-02-28 DOI: 10.14257/IJDTA.2017.10.2.02

Maedeh Afzali, Suresh Kumar

Similarity measurements are elemental concepts in text mining and information retrieval that helps us to quantify the similarity between documents, which is effective in the improvement of the performance of search engines and browsing techniques. Nowadays, varieties of similarity measures are available, but it is not clear that which similarity measure is more effective in finding the similarity of text documents. The aim of this paper is to provide a comparative analysis of various term based similarity measures such as Cosine similarity, Jaccard and Dice coefficient in order to evaluate the performance of this similarity measures in finding the similarity of two text documents.

相似度度量是文本挖掘和信息检索中的基本概念，它可以帮助我们量化文档之间的相似度，从而有效地提高搜索引擎和浏览技术的性能。目前已有多种相似度度量方法，但哪种相似度度量方法在寻找文本文档的相似度方面更有效尚不清楚。本文的目的是对各种基于术语的相似度度量(如余弦相似度、Jaccard和Dice系数)进行比较分析，以评估这些相似度度量在寻找两个文本文档的相似度方面的性能。

引用次数: 8

A Novel Event-centric Trend Detection Algorithm for Online Social Graph Analysis 一种新的以事件为中心的在线社交图趋势检测算法

International journal of database theory and application

Pub Date : 2017-02-28 DOI: 10.14257/IJDTA.2017.10.2.04

Ling Wang, Haijing Jiang, T. Zhou, Wei Ding, Chen Zhiyuan

Nowadays, the identification of the most popular and important topics discussed over social networks, is became a vital societal concern. For real-time tracking the hot topics, we proposed a novel event-centric trend detection algorithm, which called Ec_TD algorithm to attempt to add event attributes into the structure of the social networks, then, mining the subgraphs induced by specific attributes which using correlation function measures the correlation of event-changing attributes based on the attribute-extended social network structure. Our experiment shows that Ec_TD algorithm is performed significantly better in real-time event detecting and mining the potential relationships between attributes and vertexes. Moreover, we used true big data to test this algorithm which has substantially reduced respond time, and to prove the feasible of the idea.

如今，在社交网络上讨论的最流行和最重要的话题的识别已成为一个至关重要的社会问题。为了实时跟踪热点话题，我们提出了一种新的以事件为中心的趋势检测算法——Ec_TD算法，尝试将事件属性添加到社交网络结构中，然后利用关联函数挖掘特定属性引发的子图，并基于属性扩展的社交网络结构度量事件变化属性之间的相关性。实验表明，Ec_TD算法在实时事件检测和挖掘属性与顶点之间的潜在关系方面表现明显更好。并且，我们使用真实的大数据对该算法进行了测试，大大缩短了响应时间，证明了该想法的可行性。

引用次数: 0

Chinese Small Business Credit Scoring: Application of Multiple Hybrids Neural Network 中国小企业信用评分:多元混合神经网络的应用

International journal of database theory and application

Pub Date : 2017-02-28 DOI: 10.14257/IJDTA.2017.10.2.01

Chi Guo-tai, Mohammad Zoynul Abedin, F. Moula

In recent years, hybrid models have proven to be a promising approach for the forecasting of credit status, therefore, the aim of this project is to examine the prediction performance of hybrid classifiers. Particularly, the combination of the feature engineering with popular neural network (NN) classifiers; an hybridization approach, is compared with hybrid classifier, NN classifiers, and three well-known baseline classifiers, i.e. stepwise discriminant analysis (SDA), stepwise logistic regression (SLR), and decision trees (DTs). Overall, we executed a 12+8+ (8×8) experimental design that resulted in 84 unique classification models; i.e., 12 baseline models, 8 NN models, and 64 hybrid models, a multiple hybrid; are examined over a large credit scoring dataset from a Chinese commercial bank. Besides, thirteen evaluation measures are used for the assessment task and this may be the first effort to link up multiple hybrid classifiers with multiple performance metrics for the evaluation of small business credit. The results reveal that the predictive and distinguish ability of the F ratio based SDA with multilayer perceptron based NN classifier (SDA FR +MLP), a hybrid model, outperforms both of the one–dimensional scoring models (baseline model and NN model) and its hybrid counterparts.

近年来，混合模型已被证明是一种很有前途的信用状况预测方法，因此，本项目的目的是检验混合分类器的预测性能。特别是将特征工程与流行的神经网络(NN)分类器相结合;一种杂交方法，比较了混合分类器，神经网络分类器和三种众所周知的基线分类器，即逐步判别分析(SDA)，逐步逻辑回归(SLR)和决策树(dt)。总的来说，我们执行了一个12+8+ (8×8)的实验设计，产生了84个独特的分类模型;即12个基线模型，8个神经网络模型，64个混合模型，一个多重混合模型;在中国一家商业银行的大型信用评分数据集上进行了研究。此外，评估任务使用了13个评估指标，这可能是首次将多个混合分类器与多个绩效指标联系起来进行小企业信贷评估。结果表明，基于F比率的SDA与基于多层感知器的神经网络分类器(SDA FR +MLP)混合模型的预测和区分能力优于一维评分模型(基线模型和神经网络模型)及其混合模型。

{"title":"Chinese Small Business Credit Scoring: Application of Multiple Hybrids Neural Network","authors":"Chi Guo-tai, Mohammad Zoynul Abedin, F. Moula","doi":"10.14257/IJDTA.2017.10.2.01","DOIUrl":"https://doi.org/10.14257/IJDTA.2017.10.2.01","url":null,"abstract":"In recent years, hybrid models have proven to be a promising approach for the forecasting of credit status, therefore, the aim of this project is to examine the prediction performance of hybrid classifiers. Particularly, the combination of the feature engineering with popular neural network (NN) classifiers; an hybridization approach, is compared with hybrid classifier, NN classifiers, and three well-known baseline classifiers, i.e. stepwise discriminant analysis (SDA), stepwise logistic regression (SLR), and decision trees (DTs). Overall, we executed a 12+8+ (8×8) experimental design that resulted in 84 unique classification models; i.e., 12 baseline models, 8 NN models, and 64 hybrid models, a multiple hybrid; are examined over a large credit scoring dataset from a Chinese commercial bank. Besides, thirteen evaluation measures are used for the assessment task and this may be the first effort to link up multiple hybrid classifiers with multiple performance metrics for the evaluation of small business credit. The results reveal that the predictive and distinguish ability of the F ratio based SDA with multilayer perceptron based NN classifier (SDA FR +MLP), a hybrid model, outperforms both of the one–dimensional scoring models (baseline model and NN model) and its hybrid counterparts.","PeriodicalId":13926,"journal":{"name":"International journal of database theory and application","volume":"15 1","pages":"1-22"},"PeriodicalIF":0.0,"publicationDate":"2017-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73366564","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Analysis of Research Trends in Regional Innovation Using Text Mining 基于文本挖掘的区域创新研究趋势分析

International journal of database theory and application

Pub Date : 2017-02-12 DOI: 10.14257/IJDTA.2017.10.8.09

Ju Seop Park, Soongoo Hong, N. R. Kim, Bo Ra Kang

To aid local governments in solving various regional innovation issues related to regional development, trend analyses should first be conducted. In this study, 579 abstracts published in academic journals between year 2003 and year 2015 were analyzed to examine the research trends of topics related to regional innovation through a keyword frequency analysis and a social network analysis, both of which are text mining techniques. As a result of these analyses, the most frequent keyword that appeared through the clustering of participating entities was regional innovation system during the Roh Moo-Hyun administration. During the Lee Myung-Bak administration, the most frequent keyword obtained through the participation of local residents was regional innovation focused on overall business development, which continued through to the Park Geun-Hye administration. This study suggests a big data analysis method to derive the core problems related to regional innovation and may trigger follow-up research. Furthermore, the results of this study can be used as basic data for local governments and administrative agencies to establish regional innovation policies.

为了帮助地方政府解决与区域发展相关的各种区域创新问题，首先应该进行趋势分析。本研究以2003 - 2015年间发表在学术期刊上的579篇论文摘要为研究对象，采用关键词频率分析和社会网络分析两种文本挖掘技术，对区域创新相关主题的研究趋势进行了分析。分析结果显示，卢武铉政府时期通过参与主体聚集出现最多的关键词是“区域创新体系”。李明博政府时期，居民参与最多的关键词是“以整体企业开发为中心的地区革新”，这一关键词一直延续到朴槿惠政府时期。本研究建议采用大数据分析方法，推导出区域创新的核心问题，并可能引发后续研究。研究结果可作为地方政府和行政部门制定区域创新政策的基础数据。

{"title":"Analysis of Research Trends in Regional Innovation Using Text Mining","authors":"Ju Seop Park, Soongoo Hong, N. R. Kim, Bo Ra Kang","doi":"10.14257/IJDTA.2017.10.8.09","DOIUrl":"https://doi.org/10.14257/IJDTA.2017.10.8.09","url":null,"abstract":"To aid local governments in solving various regional innovation issues related to regional development, trend analyses should first be conducted. In this study, 579 abstracts published in academic journals between year 2003 and year 2015 were analyzed to examine the research trends of topics related to regional innovation through a keyword frequency analysis and a social network analysis, both of which are text mining techniques. As a result of these analyses, the most frequent keyword that appeared through the clustering of participating entities was regional innovation system during the Roh Moo-Hyun administration. During the Lee Myung-Bak administration, the most frequent keyword obtained through the participation of local residents was regional innovation focused on overall business development, which continued through to the Park Geun-Hye administration. This study suggests a big data analysis method to derive the core problems related to regional innovation and may trigger follow-up research. Furthermore, the results of this study can be used as basic data for local governments and administrative agencies to establish regional innovation policies.","PeriodicalId":13926,"journal":{"name":"International journal of database theory and application","volume":"18 1","pages":"91-98"},"PeriodicalIF":0.0,"publicationDate":"2017-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80129375","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Research on Teacher Workload Control Strategy Based on Conductive Knowledge Mining 基于导导性知识挖掘的教师工作量控制策略研究

International journal of database theory and application

Pub Date : 2017-01-31 DOI: 10.14257/IJDTA.2017.10.1.23

Ye Guangzi, Chen Yuqiang, Liao Weihua

With the extension data mining technology, the conductive knowledge mining method is applied into the management of college teachers’ workload. Under the active transformation of control strategy, the conductive effect and its confidence of teachers’ workload are calculated, to obtain the conductivity and the conductivity interval, and mine the conductive knowledge of quantitative or qualitative change. A case study of a college shows that the conductive knowledge with a higher support and confidence is helpful for the management departments of colleges to understand the degree of positive or negative effects of some strategies on teachers’ research and teaching workload in quantity, so that they can find an appropriate strategy used to control the teachers’ workload.

利用可拓数据挖掘技术，将导电性知识挖掘方法应用到高校教师工作量管理中。在控制策略的主动转化下，计算教师工作量的传导效果及其置信度，得到传导率和传导区间，挖掘量变或质变的传导知识。一所高校的案例研究表明，具有较高支持度和信任度的导导性知识有助于高校管理部门从数量上了解某些策略对教师科研和教学工作量的积极或消极影响程度，从而找到合适的策略来控制教师的工作量。

引用次数: 0

A Novel Random Forest Approach Using Specific Under Sampling Strategy 一种基于特定欠采样策略的随机森林方法

International journal of database theory and application

Pub Date : 2017-01-31 DOI: 10.1007/978-981-10-3223-3_24

L. Prasanthi, R. K. Kumar, K. Srinivas

引用次数: 1

A Review on Software Defect Prediction Techniques Using Product Metrics 基于产品度量的软件缺陷预测技术综述

International journal of database theory and application

Pub Date : 2017-01-31 DOI: 10.14257/IJDTA.2017.10.1.15

R. Jayanthi, L. Florence, Arti Arya

Presently, complexity and volume of software systems are increasing with a rapid rate. In some cases it improves performance and brings efficient outcome, but unfortunately in several situations it leads to elevated cost for testing, meaningless outcome and inferior quality, even there is no trustworthiness of the products. Fault prediction in software plays a vital role in enhancing the software excellence as well as it helps in software testing to decrease the price and time. Conventionally, to describe the difficulty and calculate the duration of the programming, software metrics can be utilized. To forecast the amount of faults in module and utilizing software metrics, an extensive investigation is performed. With the purpose of recognizing the causes which importantly enhances the fault prediction models related to product metrics, this empirical research is made. This paper visits various software metrics and suggested procedures through which software defect prediction is enhanced and also summarizes those techniques.

当前，软件系统的复杂性和体积都在快速增长。在某些情况下，它提高了性能，带来了有效的结果，但不幸的是，在一些情况下，它会导致测试成本上升，结果毫无意义，质量低劣，甚至产品不值得信赖。软件故障预测对于提高软件的性能和降低软件测试的成本和时间具有重要的作用。通常，为了描述编程的难度和计算编程的持续时间，可以使用软件度量。为了预测模块中的故障数量并利用软件度量，进行了广泛的调查。为了更好地识别故障原因，提高与产品指标相关的故障预测模型的可靠性，本文进行了实证研究。本文访问了各种软件度量和建议的过程，通过这些过程可以增强软件缺陷预测，并总结了这些技术。

引用次数: 4

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

International journal of database theory and application

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀