Pub Date : 2014-02-06DOI: 10.18495/COMENGAPP.V3I1.42
D. Duche
Power lines form the medium of transmission in PLC systems. The original purpose of these lines is the transportation of electric signals at 50 or 60 Hz .This paper proposes a new channel modeling method for power line communications networks based on the multipath profile in the time domain. The new channel model is developed to be applied in a range of Power line Communications (PLC) research topics such as impulse noise modeling, deployment and coverage studies, and communications theory analysis. The statistical multipath parameters such as path arrival time, magnitude and interval for each category are analyzed to build the model. Each generated channel based on the proposed Power line communication that a performance channel characteristic represents a different realization of a PLC network
{"title":"Power Line Communication Performance Channel Characteristics","authors":"D. Duche","doi":"10.18495/COMENGAPP.V3I1.42","DOIUrl":"https://doi.org/10.18495/COMENGAPP.V3I1.42","url":null,"abstract":"Power lines form the medium of transmission in PLC systems. The original purpose of these lines is the transportation of electric signals at 50 or 60 Hz .This paper proposes a new channel modeling method for power line communications networks based on the multipath profile in the time domain. The new channel model is developed to be applied in a range of Power line Communications (PLC) research topics such as impulse noise modeling, deployment and coverage studies, and communications theory analysis. The statistical multipath parameters such as path arrival time, magnitude and interval for each category are analyzed to build the model. Each generated channel based on the proposed Power line communication that a performance channel characteristic represents a different realization of a PLC network","PeriodicalId":120500,"journal":{"name":"Computer Engineering and Applications","volume":"92 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115885209","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-02-06DOI: 10.18495/COMENGAPP.V3I1.39
T. Bijeesh
Compressed sensing (CS) is a data acquisition technique that is gaining popularity because of the fact that the reconstruction of the original signal is possible even if it was sampled at a sub-Nyquist rate. In contrast to the traditional sampling method, in CS we take a few measurements from the signal and the original signal can then be reconstructed from these measurements by using an optimization technique called l1 -minimization. Computer engineers and mathematician have been equally fascinated by this latest trend in digital signal processing. In this work we perform an evaluation of different l1 -minimization algorithms for their performance in reconstructing the signal in the context of CS. The algorithms that have been evaluated are PALM (Primal Augmented Lagrangian Multiplier method), DALM (Dual Augmented Lagrangian Multiplier method) and ISTA (Iterative Soft Thresholding Algorithm). The evaluation is done based on three parameters which are execution time, PSNR and RMSE.
{"title":"Performance evaluation of popular l1-minimization algorithms in the context of Compressed Sensing","authors":"T. Bijeesh","doi":"10.18495/COMENGAPP.V3I1.39","DOIUrl":"https://doi.org/10.18495/COMENGAPP.V3I1.39","url":null,"abstract":"Compressed sensing (CS) is a data acquisition technique that is gaining popularity because of the fact that the reconstruction of the original signal is possible even if it was sampled at a sub-Nyquist rate. In contrast to the traditional sampling method, in CS we take a few measurements from the signal and the original signal can then be reconstructed from these measurements by using an optimization technique called l1 -minimization. Computer engineers and mathematician have been equally fascinated by this latest trend in digital signal processing. In this work we perform an evaluation of different l1 -minimization algorithms for their performance in reconstructing the signal in the context of CS. The algorithms that have been evaluated are PALM (Primal Augmented Lagrangian Multiplier method), DALM (Dual Augmented Lagrangian Multiplier method) and ISTA (Iterative Soft Thresholding Algorithm). The evaluation is done based on three parameters which are execution time, PSNR and RMSE.","PeriodicalId":120500,"journal":{"name":"Computer Engineering and Applications","volume":"140 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124744111","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-12-11DOI: 10.18495/COMENGAPP.V2I3.36
Amir Mirzadeh Phirouzabadi, M. Mahmoudian, M. Asghari
Nowadays, Innovation can be named as one of the best practices as quality, speed, dependability, flexibility and cost which it helps organization enter to new markets, increase the existing market share and provide it with a competitive edge. In addition, organizations have moved forward from “hiding idea (Closed Innovation)” to “opening them (Open Innovation)”. Therefore, concepts such as “open innovation” and “innovation network” have become important and beneficial to both academic and market society. Therefore, this study tried to empirically study the effects of networking on innovations. In this regard, in order to empirically explore how networking influences innovations, this paper used types of innovations based on OCED definition as organizational, marketing, process and product and compared their changes before and after networking of 45 companies in the network Pardis Technology Park as a case study. The results and findings showed that all of the innovation types were increased after jointing the companies to the network. In fact, we arranged these changing proportions from the most to the least change as marketing, process, organizational and product innovation respectively. Although there were some negative growth in some measures of these innovations after jointing into the network.
{"title":"How Networking Empirically Influences the Types of Innovation?: Pardis Technology Park as a Case Study","authors":"Amir Mirzadeh Phirouzabadi, M. Mahmoudian, M. Asghari","doi":"10.18495/COMENGAPP.V2I3.36","DOIUrl":"https://doi.org/10.18495/COMENGAPP.V2I3.36","url":null,"abstract":"Nowadays, Innovation can be named as one of the best practices as quality, speed, dependability, flexibility and cost which it helps organization enter to new markets, increase the existing market share and provide it with a competitive edge. In addition, organizations have moved forward from “hiding idea (Closed Innovation)” to “opening them (Open Innovation)”. Therefore, concepts such as “open innovation” and “innovation network” have become important and beneficial to both academic and market society. Therefore, this study tried to empirically study the effects of networking on innovations. In this regard, in order to empirically explore how networking influences innovations, this paper used types of innovations based on OCED definition as organizational, marketing, process and product and compared their changes before and after networking of 45 companies in the network Pardis Technology Park as a case study. The results and findings showed that all of the innovation types were increased after jointing the companies to the network. In fact, we arranged these changing proportions from the most to the least change as marketing, process, organizational and product innovation respectively. Although there were some negative growth in some measures of these innovations after jointing into the network.","PeriodicalId":120500,"journal":{"name":"Computer Engineering and Applications","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123647382","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-12-11DOI: 10.18495/COMENGAPP.V2I3.30
Pooja Gupta, Ashish Kumar
The paper proposes a framework to improve the privacy preserving data mining. The approach adopted provides security at both the ends i.e. at the data transmission time as well as in the data mining process using two phases. The secure data transmission is handled using elliptic curve cryptography (ECC) and the privacy is preserved using k-anonymity. The proposed framework ensures highly secure environment. We observed that the framework outperforms other approaches [8] discussed in the literature at both ends i.e. at security and privacy of data. Since most of the approaches have considered either secure transmission or privacy preserving data mining but very few have considered both. We have used WEKA 3.6.9 for experimentation and analysis of our approach. We have also analyzed the case of k-anonymity when the numbers of records in a group are less than k (hiding factor) by inserting fake records. The obtained results have shown the pattern that the insertion of fake records leads to more accuracy as compared to full suppression of records. Since, full suppression may hide important information in cases where records are less than k, on the other hand in the process of fake records insertion; records are available even if number of records in a group is less than k.
{"title":"Two phase privacy preserving data mining","authors":"Pooja Gupta, Ashish Kumar","doi":"10.18495/COMENGAPP.V2I3.30","DOIUrl":"https://doi.org/10.18495/COMENGAPP.V2I3.30","url":null,"abstract":"The paper proposes a framework to improve the privacy preserving data mining. The approach adopted provides security at both the ends i.e. at the data transmission time as well as in the data mining process using two phases. The secure data transmission is handled using elliptic curve cryptography (ECC) and the privacy is preserved using k-anonymity. The proposed framework ensures highly secure environment. We observed that the framework outperforms other approaches [8] discussed in the literature at both ends i.e. at security and privacy of data. Since most of the approaches have considered either secure transmission or privacy preserving data mining but very few have considered both. We have used WEKA 3.6.9 for experimentation and analysis of our approach. We have also analyzed the case of k-anonymity when the numbers of records in a group are less than k (hiding factor) by inserting fake records. The obtained results have shown the pattern that the insertion of fake records leads to more accuracy as compared to full suppression of records. Since, full suppression may hide important information in cases where records are less than k, on the other hand in the process of fake records insertion; records are available even if number of records in a group is less than k.","PeriodicalId":120500,"journal":{"name":"Computer Engineering and Applications","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122228245","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-12-11DOI: 10.18495/COMENGAPP.V2I3.35
S. Wakode
Any tracking algorithm must be able to detect interested moving objects in its field of view and then track it from frame to frame. The tracking algorithms based on mean shift are robust and efficient. But they have limitations like inaccuracy of target localization, object being tracked must not pass by another object with similar features i.e. occlusion and fast object motion. This paper proposes and compares an improved adaptive mean shift algorithm and adaptive mean shift using a convex kernel function through motion information. Experimental results show that both methods track the object without tracking errors. Adaptive method gives less computation cost and proper target localization and Mean shift using convex kernel function shows good results for the tracking challenges like partial occlusion and fast object motion faced by basic Mean shift algorithm.
{"title":"Improvement and Comparison of Mean Shift Tracker using Convex Kernel Function and Motion Information","authors":"S. Wakode","doi":"10.18495/COMENGAPP.V2I3.35","DOIUrl":"https://doi.org/10.18495/COMENGAPP.V2I3.35","url":null,"abstract":"Any tracking algorithm must be able to detect interested moving objects in its field of view and then track it from frame to frame. The tracking algorithms based on mean shift are robust and efficient. But they have limitations like inaccuracy of target localization, object being tracked must not pass by another object with similar features i.e. occlusion and fast object motion. This paper proposes and compares an improved adaptive mean shift algorithm and adaptive mean shift using a convex kernel function through motion information. Experimental results show that both methods track the object without tracking errors. Adaptive method gives less computation cost and proper target localization and Mean shift using convex kernel function shows good results for the tracking challenges like partial occlusion and fast object motion faced by basic Mean shift algorithm.","PeriodicalId":120500,"journal":{"name":"Computer Engineering and Applications","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130574677","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-12-11DOI: 10.18495/COMENGAPP.V2I3.32
Rossi Passarella, Muhammad Fadli, S. Sutarno
In this paper we purpose a several steps to implement security for locking door by using hand gestures as password. The methods considered as preprocessing image, skin detection and Convexity Defection. The main components of the system are Camera, Personal Computer (PC), Microcontroller and Motor (Lock). Bluetooth communication are applied to communicate between PC and microcontroller to open and lock door used commands character such as “O” and “C”. The results of this system show that the hand gestures can be measured, identified and quantified consistently.
{"title":"Design Concept of Convexity Defect Method on Hand Gestures as Password Door Lock","authors":"Rossi Passarella, Muhammad Fadli, S. Sutarno","doi":"10.18495/COMENGAPP.V2I3.32","DOIUrl":"https://doi.org/10.18495/COMENGAPP.V2I3.32","url":null,"abstract":"In this paper we purpose a several steps to implement security for locking door by using hand gestures as password. The methods considered as preprocessing image, skin detection and Convexity Defection. The main components of the system are Camera, Personal Computer (PC), Microcontroller and Motor (Lock). Bluetooth communication are applied to communicate between PC and microcontroller to open and lock door used commands character such as “O” and “C”. The results of this system show that the hand gestures can be measured, identified and quantified consistently.","PeriodicalId":120500,"journal":{"name":"Computer Engineering and Applications","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114658499","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-12-11DOI: 10.18495/COMENGAPP.V2I3.29
Seyed Mostafa Pourhashemi
The growing volume of spam emails has resulted in the necessity for more accurate and efficient email classification system. The purpose of this research is presenting an machine learning approach for enhancing the accuracy of automatic spam detecting and filtering and separating them from legitimate messages. In this regard, for reducing the error rate and increasing the efficiency, the hybrid architecture on feature selection has been used. Features used in these systems, are the body of text messages. Proposed system of this research has used the combination of two filtering models, Filter and Wrapper, with Information Gain (IG) filter and Complement Naive Bayes (CNB) wrapper as feature selectors. In addition, Multinomial Naive Bayes (MNB) classifier, Discriminative Multinomial Naive Bayes (DMNB) classifier, Support Vector Machine (SVM) classifier and Random Forest classifier are used for classification. Finally, the output results of this classifiers and feature selection methods are examined and the best design is selected and it is compared with another similar works by considering different parameters. The optimal accuracy of the proposed system is evaluated equal to 99%.
随着垃圾邮件数量的不断增加,需要更准确、更高效的邮件分类系统。本研究的目的是提出一种机器学习方法来提高自动垃圾邮件检测和过滤的准确性,并将它们与合法消息分离。为此,为了降低错误率和提高效率,在特征选择上采用了混合结构。在这些系统中使用的功能是文本消息的主体。本研究提出的系统采用了Filter和Wrapper两种过滤模型的结合,以Information Gain (IG) Filter和Complement Naive Bayes (CNB) Wrapper作为特征选择器。此外,还使用多项朴素贝叶斯(MNB)分类器、判别多项朴素贝叶斯(DMNB)分类器、支持向量机(SVM)分类器和随机森林分类器进行分类。最后,对该分类器和特征选择方法的输出结果进行检验,选出最佳设计,并在考虑不同参数的情况下与同类作品进行比较。经评估,该系统的最优精度为99%。
{"title":"E-mail spam filtering by a new hybrid feature selection method using IG and CNB wrapper","authors":"Seyed Mostafa Pourhashemi","doi":"10.18495/COMENGAPP.V2I3.29","DOIUrl":"https://doi.org/10.18495/COMENGAPP.V2I3.29","url":null,"abstract":"The growing volume of spam emails has resulted in the necessity for more accurate and efficient email classification system. The purpose of this research is presenting an machine learning approach for enhancing the accuracy of automatic spam detecting and filtering and separating them from legitimate messages. In this regard, for reducing the error rate and increasing the efficiency, the hybrid architecture on feature selection has been used. Features used in these systems, are the body of text messages. Proposed system of this research has used the combination of two filtering models, Filter and Wrapper, with Information Gain (IG) filter and Complement Naive Bayes (CNB) wrapper as feature selectors. In addition, Multinomial Naive Bayes (MNB) classifier, Discriminative Multinomial Naive Bayes (DMNB) classifier, Support Vector Machine (SVM) classifier and Random Forest classifier are used for classification. Finally, the output results of this classifiers and feature selection methods are examined and the best design is selected and it is compared with another similar works by considering different parameters. The optimal accuracy of the proposed system is evaluated equal to 99%.","PeriodicalId":120500,"journal":{"name":"Computer Engineering and Applications","volume":"105 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117222244","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-12-11DOI: 10.18495/COMENGAPP.V2I3.34
Neeta Gargote, S. Devaraj, S. Shahapure
Color image segmentation is probably the most important task in image analysis and understanding. A novel Human Perception Based Color Image Segmentation System is presented in this paper. This system uses a neural network architecture. The neurons here uses a multisigmoid activation function. The multisigmoid activation function is the key for segmentation. The number of steps ie. thresholds in the multisigmoid function are dependent on the number of clusters in the image. The threshold values for detecting the clusters and their labels are found automatically from the first order derivative of histograms of saturation and intensity in the HSI color space. Here the main use of neural network is to detect the number of objects automatically from an image. It labels the objects with their mean colors. The algorithm is found to be reliable and works satisfactorily on different kinds of color images.
{"title":"Human Perception Based Color Image Segmentation","authors":"Neeta Gargote, S. Devaraj, S. Shahapure","doi":"10.18495/COMENGAPP.V2I3.34","DOIUrl":"https://doi.org/10.18495/COMENGAPP.V2I3.34","url":null,"abstract":"Color image segmentation is probably the most important task in image analysis and understanding. A novel Human Perception Based Color Image Segmentation System is presented in this paper. This system uses a neural network architecture. The neurons here uses a multisigmoid activation function. The multisigmoid activation function is the key for segmentation. The number of steps ie. thresholds in the multisigmoid function are dependent on the number of clusters in the image. The threshold values for detecting the clusters and their labels are found automatically from the first order derivative of histograms of saturation and intensity in the HSI color space. Here the main use of neural network is to detect the number of objects automatically from an image. It labels the objects with their mean colors. The algorithm is found to be reliable and works satisfactorily on different kinds of color images.","PeriodicalId":120500,"journal":{"name":"Computer Engineering and Applications","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132519147","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-09-16DOI: 10.18495/COMENGAPP.V2I2.25
T. Prasath, D. Rampriya
In the emerging trends the pervasive nature across the computing environment shows that the system is platform independent and device independent. The system development is designed with the help of Structured Query Language and middleware infrastructure that are used to collect the information from various nodes. An essential feature of this proposed middleware architecture suites the device independent as the major supporting capability to the system. This facilitates to add new device types in the system feels easy through the use of device self-description. It mainly focuses on the issues related to the heterogeneity of the different devices composing a pervasive system: This aspect is investigated both at data management and at physical integration levels. Using the nontrivial approach aims at handling the related issues are resolved with the corresponding solution. Keyword: Perla, Cloud Monitoring, Middleware, Declarative Language
{"title":"Computing Trends and Converging Technological Factors With Device Metric Enhancement to Envelop Application Wide Middleware Infrastructure","authors":"T. Prasath, D. Rampriya","doi":"10.18495/COMENGAPP.V2I2.25","DOIUrl":"https://doi.org/10.18495/COMENGAPP.V2I2.25","url":null,"abstract":"In the emerging trends the pervasive nature across the computing environment shows that the system is platform independent and device independent. The system development is designed with the help of Structured Query Language and middleware infrastructure that are used to collect the information from various nodes. An essential feature of this proposed middleware architecture suites the device independent as the major supporting capability to the system. This facilitates to add new device types in the system feels easy through the use of device self-description. It mainly focuses on the issues related to the heterogeneity of the different devices composing a pervasive system: This aspect is investigated both at data management and at physical integration levels. Using the nontrivial approach aims at handling the related issues are resolved with the corresponding solution. Keyword: Perla, Cloud Monitoring, Middleware, Declarative Language","PeriodicalId":120500,"journal":{"name":"Computer Engineering and Applications","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131368331","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-09-16DOI: 10.18495/comengapp.v2i2.26
A. Kalpana, P. Rambabu, D. LakshmiSreeniuvasareddy
An Outlier is a data point which is significantly different from the remaining data points. Outlier is also referred as discordant, deviants and abnormalities. Outliers may have a particular interest, such as credit card fraud detection, where outliers indicate fraudulent activity. Thus, outlier detection analysis is an interesting data mining task, referred to as outlier analysis. Detecting outliers efficiently from dataset is an important task in many fields like Credit card Fraud, Medicine, Law enforcement, Earth Sciences etc. Many methods are available to identify outliers in numerical dataset. But there exist limited number of methods are available for categorical and mixed attribute datasets. In the proposed work, a novel outlier detection method is proposed. This proposed method finds anomalies based on each record’s “multi attribute outlier factor through correlation” score and it has great intuitive appeal. This algorithm utilizes the frequency of each value in categorical part of the dataset and correlation factor of each record with mean record of the entire dataset. This proposed method used Attribute Value Frequency score (AVF score) concept for categorical part. Results of the proposed method are compared with existing methods. The Bank data (Mixed) is used for experiments in this paper which is taken from UCI machine learning repository. Keyword: Outlier, Mixed Attribute Datasets, Attribute Value Frequency Score
离群点是与其他数据点显著不同的数据点。离群值也被称为不和谐、偏差和异常。异常值可能具有特定的兴趣,例如信用卡欺诈检测,其中异常值表示欺诈活动。因此,离群点检测分析是一项有趣的数据挖掘任务,称为离群点分析。有效地从数据集中检测异常值是信用卡欺诈、医学、执法、地球科学等许多领域的重要任务。数值数据集中异常值的识别方法有很多。但是对于分类和混合属性数据集,可用的方法有限。本文提出了一种新的异常值检测方法。该方法基于每条记录的“多属性异常因子关联”得分来发现异常,具有很强的直观吸引力。该算法利用了数据集分类部分各值出现的频率以及每条记录与整个数据集均值记录的相关系数。该方法采用属性值频率评分(Attribute Value Frequency score, AVF score)概念对分类部分进行分类。将所提方法的结果与现有方法进行了比较。本文中使用的Bank数据(Mixed)取自UCI机器学习存储库。关键词:离群值,混合属性数据集,属性值频率评分
{"title":"A Novel Technique to Find Outliers in Mixed Attribute Datasets","authors":"A. Kalpana, P. Rambabu, D. LakshmiSreeniuvasareddy","doi":"10.18495/comengapp.v2i2.26","DOIUrl":"https://doi.org/10.18495/comengapp.v2i2.26","url":null,"abstract":"An Outlier is a data point which is significantly different from the remaining data points. Outlier is also referred as discordant, deviants and abnormalities. Outliers may have a particular interest, such as credit card fraud detection, where outliers indicate fraudulent activity. Thus, outlier detection analysis is an interesting data mining task, referred to as outlier analysis. Detecting outliers efficiently from dataset is an important task in many fields like Credit card Fraud, Medicine, Law enforcement, Earth Sciences etc. Many methods are available to identify outliers in numerical dataset. But there exist limited number of methods are available for categorical and mixed attribute datasets. In the proposed work, a novel outlier detection method is proposed. This proposed method finds anomalies based on each record’s “multi attribute outlier factor through correlation” score and it has great intuitive appeal. This algorithm utilizes the frequency of each value in categorical part of the dataset and correlation factor of each record with mean record of the entire dataset. This proposed method used Attribute Value Frequency score (AVF score) concept for categorical part. Results of the proposed method are compared with existing methods. The Bank data (Mixed) is used for experiments in this paper which is taken from UCI machine learning repository. Keyword: Outlier, Mixed Attribute Datasets, Attribute Value Frequency Score","PeriodicalId":120500,"journal":{"name":"Computer Engineering and Applications","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129948423","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}