Pub Date : 2018-11-01DOI: 10.1109/ICSAI.2018.8599404
Quan Li, Wei Wu, Yang Su
There are great breakthroughs in image denoising based on image self-similarity and the introduction of sparse representation and low rank theory. Some state-of-the-art image restoration techniques, including BM3D and SAIST are brought forward and applied to various vision tasks. In this paper, we propose an enhanced SAIST algorithm for image denoising. These improvements are mainly implemented in the following aspects. First, when matching similar blocks, matching results are depended on block distances which affected by noise interference. Thus DCT pre-filtering is introduced before aggregation because it can effectively suppress measurement errors of block distances. Second, the relevance of image patches which affects the singular value thresholding is not considered in sample mean. So a weighted sample mean calculation method is proposed to make the singular value thresholding more adaptive. The experimental results show that this improved algorithm achieves a better performance than the original algorithm in terms of both objective criterion and subjective visual quality.
{"title":"An Enhanced Lowrank Algorithm for Image Denoising","authors":"Quan Li, Wei Wu, Yang Su","doi":"10.1109/ICSAI.2018.8599404","DOIUrl":"https://doi.org/10.1109/ICSAI.2018.8599404","url":null,"abstract":"There are great breakthroughs in image denoising based on image self-similarity and the introduction of sparse representation and low rank theory. Some state-of-the-art image restoration techniques, including BM3D and SAIST are brought forward and applied to various vision tasks. In this paper, we propose an enhanced SAIST algorithm for image denoising. These improvements are mainly implemented in the following aspects. First, when matching similar blocks, matching results are depended on block distances which affected by noise interference. Thus DCT pre-filtering is introduced before aggregation because it can effectively suppress measurement errors of block distances. Second, the relevance of image patches which affects the singular value thresholding is not considered in sample mean. So a weighted sample mean calculation method is proposed to make the singular value thresholding more adaptive. The experimental results show that this improved algorithm achieves a better performance than the original algorithm in terms of both objective criterion and subjective visual quality.","PeriodicalId":375852,"journal":{"name":"2018 5th International Conference on Systems and Informatics (ICSAI)","volume":"128 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132322638","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-11-01DOI: 10.1109/ICSAI.2018.8599394
Dong-mei Tang, Yan-hua Zeng, Y. Fu, Bo Zhang
The test sieves are widely used, but the calibration of the test sieves have the disadvantages of time consuming, low objectivity and accuracy. In order to solve these problems, the calibration device for test sieves based on image measuring instrument is designed and developed. Firstly, the design idea and principle of the calibration device for the test sieves are introduced. Secondly, the hardware and software system of the calibration device are studied. Finally, the reliability of the calibration device is verified by the stability and comparison experiment. Due to the successful development of this calibration device for test sieves based on image measuring instrument, automatically calibration, high accuracy, improvement of efficiency and reduction of labor intensity are achieved.
{"title":"Calibration Device for Test Sieves Based on Image Measuring Instrument","authors":"Dong-mei Tang, Yan-hua Zeng, Y. Fu, Bo Zhang","doi":"10.1109/ICSAI.2018.8599394","DOIUrl":"https://doi.org/10.1109/ICSAI.2018.8599394","url":null,"abstract":"The test sieves are widely used, but the calibration of the test sieves have the disadvantages of time consuming, low objectivity and accuracy. In order to solve these problems, the calibration device for test sieves based on image measuring instrument is designed and developed. Firstly, the design idea and principle of the calibration device for the test sieves are introduced. Secondly, the hardware and software system of the calibration device are studied. Finally, the reliability of the calibration device is verified by the stability and comparison experiment. Due to the successful development of this calibration device for test sieves based on image measuring instrument, automatically calibration, high accuracy, improvement of efficiency and reduction of labor intensity are achieved.","PeriodicalId":375852,"journal":{"name":"2018 5th International Conference on Systems and Informatics (ICSAI)","volume":"41 9","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132287524","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-11-01DOI: 10.1109/ICSAI.2018.8599506
Yuanyuan Sun, Cong Jin, Wei Zhao, Nansu Wang
In order to solve the problems of real-time beat tracking, such as the uncertainty of real beat value, the difficulty of getting close to people’s perception of music and the position of beat according to people’s feelings, the fact that most data sets are private and the amount of data is small, which affects the accuracy of experimental results, a real-time beat tracking method based on lstm neural network is proposed, which abandons the traditional idea of beat tracking to determine the position of beat, divides the beat into five levels according to the degree of strength, and then trains the beat information by using lstm network. Experiments show that the system functions well and the accuracy of the training results is guaranteed to reach 0.946.
{"title":"Design of real-time rhythm tracking system based on neural network","authors":"Yuanyuan Sun, Cong Jin, Wei Zhao, Nansu Wang","doi":"10.1109/ICSAI.2018.8599506","DOIUrl":"https://doi.org/10.1109/ICSAI.2018.8599506","url":null,"abstract":"In order to solve the problems of real-time beat tracking, such as the uncertainty of real beat value, the difficulty of getting close to people’s perception of music and the position of beat according to people’s feelings, the fact that most data sets are private and the amount of data is small, which affects the accuracy of experimental results, a real-time beat tracking method based on lstm neural network is proposed, which abandons the traditional idea of beat tracking to determine the position of beat, divides the beat into five levels according to the degree of strength, and then trains the beat information by using lstm network. Experiments show that the system functions well and the accuracy of the training results is guaranteed to reach 0.946.","PeriodicalId":375852,"journal":{"name":"2018 5th International Conference on Systems and Informatics (ICSAI)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121606889","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Field-programmable gate array (FPGA) has enormous potential in the field of Integrated Circuit (IC) due to its programmability, short design cycle, and high flexibility in parallel computing. Nevertheless, increasing chip integration and shrinking transistor size lead to non-negligible power dissipation in FPGA. Specifically, leakage power dissipation issue as a crucial part of power consumption in FPGA requires being concerned urgently. In this paper, a time-based leakage-power aware algorithm (TBLA) is proposed to address the aforementioned issue on 2D dynamic partial reconfigurable FPGA. Experimental results show that the proposed TBLA algorithm reduces the leakage-power and scheduling overhead without increasing the overall execution time of an application compared to traditional algorithms.
{"title":"A Time-based Leakage-aware Algorithm for Task Placement and Scheduling Problem on Dynamic Reconfigurable FPGA","authors":"Tingyu Zhou, Tieyuan Pan, Zhiguo Bao, Takahiro Watanabe","doi":"10.1109/ICSAI.2018.8599330","DOIUrl":"https://doi.org/10.1109/ICSAI.2018.8599330","url":null,"abstract":"Field-programmable gate array (FPGA) has enormous potential in the field of Integrated Circuit (IC) due to its programmability, short design cycle, and high flexibility in parallel computing. Nevertheless, increasing chip integration and shrinking transistor size lead to non-negligible power dissipation in FPGA. Specifically, leakage power dissipation issue as a crucial part of power consumption in FPGA requires being concerned urgently. In this paper, a time-based leakage-power aware algorithm (TBLA) is proposed to address the aforementioned issue on 2D dynamic partial reconfigurable FPGA. Experimental results show that the proposed TBLA algorithm reduces the leakage-power and scheduling overhead without increasing the overall execution time of an application compared to traditional algorithms.","PeriodicalId":375852,"journal":{"name":"2018 5th International Conference on Systems and Informatics (ICSAI)","volume":"115 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117153340","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-11-01DOI: 10.1109/ICSAI.2018.8599297
Yun Liu, Buyang Cao, Yahui Liu
The prediction of the faulty rate of a cellphone is essential for the supply chain management system of spare parts. However, the fault rate of the mobile is affected by many factors that makes it difficult to predict. In this work, some new concepts for prediction of faulty rate based on grey model theory such as grey fault rate and grey model fault count are proposed. It is found that the grey fault rate is consistent with the bathtub curve that widely applied in the reliability engineering. The grey model theory is utilized to solve the problem of random individual fault affecting the prediction negatively. The characteristic value of the grey failure rate is defined to describe the fault rate for certain phones’ models. We develop the method to predict the fault of a new phone model based on the data of certain old phone models and their grey failure rate. The proposed method is applied to fault rate prediction of two cellphone models that results with the prediction deviation about 2% over 3 years.
{"title":"The Prediction of Cellphones’ Fault Rates with Grey Models","authors":"Yun Liu, Buyang Cao, Yahui Liu","doi":"10.1109/ICSAI.2018.8599297","DOIUrl":"https://doi.org/10.1109/ICSAI.2018.8599297","url":null,"abstract":"The prediction of the faulty rate of a cellphone is essential for the supply chain management system of spare parts. However, the fault rate of the mobile is affected by many factors that makes it difficult to predict. In this work, some new concepts for prediction of faulty rate based on grey model theory such as grey fault rate and grey model fault count are proposed. It is found that the grey fault rate is consistent with the bathtub curve that widely applied in the reliability engineering. The grey model theory is utilized to solve the problem of random individual fault affecting the prediction negatively. The characteristic value of the grey failure rate is defined to describe the fault rate for certain phones’ models. We develop the method to predict the fault of a new phone model based on the data of certain old phone models and their grey failure rate. The proposed method is applied to fault rate prediction of two cellphone models that results with the prediction deviation about 2% over 3 years.","PeriodicalId":375852,"journal":{"name":"2018 5th International Conference on Systems and Informatics (ICSAI)","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116337063","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-11-01DOI: 10.1109/ICSAI.2018.8599432
Ruifang Li, L. Jin
In traditional cognitive wireless network, most studies on spectrum allocation are on the basis of static network topology. However, the vehicles in the cognitive vehicular network have high-speed mobility and the network topology changes frequently, which makes spectrum allocation more challenging. In this paper, the above factors are considered and a connection between the remaining available time of the primary user and the time required by the cognitive vehicle is established in our spectrum allocation model. To maximize network throughput under the heterogeneous spectrum environment, a rapid convergence algorithm that adapts to a dynamic cognitive vehicular network environment for solving this problem is necessary. Therefore, the improved adaptive binary cuckoo search (IABCS) algorithm that incorporates the simplex method into the adaptive binary cuckoo algorithm is proposed. The experimental results indicate that comparing with the original standard cuckoo search $(CS)$ algorithm and the improved particle swarm optimization (PSO) algorithm, the spectrum allocation method based on the improved adaptive cuckoo algorithm converges faster and achieves higher throughput.
{"title":"Improved Cuckoo Algorithm for Spectrum Allocation in Cognitive Vehicular Network","authors":"Ruifang Li, L. Jin","doi":"10.1109/ICSAI.2018.8599432","DOIUrl":"https://doi.org/10.1109/ICSAI.2018.8599432","url":null,"abstract":"In traditional cognitive wireless network, most studies on spectrum allocation are on the basis of static network topology. However, the vehicles in the cognitive vehicular network have high-speed mobility and the network topology changes frequently, which makes spectrum allocation more challenging. In this paper, the above factors are considered and a connection between the remaining available time of the primary user and the time required by the cognitive vehicle is established in our spectrum allocation model. To maximize network throughput under the heterogeneous spectrum environment, a rapid convergence algorithm that adapts to a dynamic cognitive vehicular network environment for solving this problem is necessary. Therefore, the improved adaptive binary cuckoo search (IABCS) algorithm that incorporates the simplex method into the adaptive binary cuckoo algorithm is proposed. The experimental results indicate that comparing with the original standard cuckoo search $(CS)$ algorithm and the improved particle swarm optimization (PSO) algorithm, the spectrum allocation method based on the improved adaptive cuckoo algorithm converges faster and achieves higher throughput.","PeriodicalId":375852,"journal":{"name":"2018 5th International Conference on Systems and Informatics (ICSAI)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115471700","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-11-01DOI: 10.1109/ICSAI.2018.8599411
Yuanyuan Zhai, Ying-Li Chen, Yue Jiang, Qianzhong Li
Lung cancer is the most common form of malignancies tumor, influenced by complex molecular network. Further understanding of the molecular mechanisms that lead to Lung cancer would be conducive to the detection and supervisory control of cancer, thereby improving clinical cancer treatment and personalized treatment. In this study, 47 co-expression modules were identified by constructing a weighted gene co-expression network. Subsequently, we investigated the biological significance of these modules by studying the GO biological process and KEGG pathways. The results show that two significant modules (green module and green-yellow module) enrich in the progress of blood vessels development, immune response and regulation, respectively. The top 50 genes with the two modules contain 3 LncRNAs and some hub genes, respectively. Therefore, these LncRNAs and the hub genes of SPTBN1, SFTPC, FHL1, and RP5-826L7 in the green module and FCER1G, NLRC4 and SAMHD1 in the green-yellow module may be associated with lung adenocarcinoma. It has been experimentally proved that they may play a crucial part in the pathogenesis of lung adenocarcinoma. In addition, the analysis of these hub genes may provide a reference to further learn about the pathogenesis of lung cancer.
{"title":"Weighted Gene Co-expression Network Analysis of Gene Modules for Lung Adenocarcinoma","authors":"Yuanyuan Zhai, Ying-Li Chen, Yue Jiang, Qianzhong Li","doi":"10.1109/ICSAI.2018.8599411","DOIUrl":"https://doi.org/10.1109/ICSAI.2018.8599411","url":null,"abstract":"Lung cancer is the most common form of malignancies tumor, influenced by complex molecular network. Further understanding of the molecular mechanisms that lead to Lung cancer would be conducive to the detection and supervisory control of cancer, thereby improving clinical cancer treatment and personalized treatment. In this study, 47 co-expression modules were identified by constructing a weighted gene co-expression network. Subsequently, we investigated the biological significance of these modules by studying the GO biological process and KEGG pathways. The results show that two significant modules (green module and green-yellow module) enrich in the progress of blood vessels development, immune response and regulation, respectively. The top 50 genes with the two modules contain 3 LncRNAs and some hub genes, respectively. Therefore, these LncRNAs and the hub genes of SPTBN1, SFTPC, FHL1, and RP5-826L7 in the green module and FCER1G, NLRC4 and SAMHD1 in the green-yellow module may be associated with lung adenocarcinoma. It has been experimentally proved that they may play a crucial part in the pathogenesis of lung adenocarcinoma. In addition, the analysis of these hub genes may provide a reference to further learn about the pathogenesis of lung cancer.","PeriodicalId":375852,"journal":{"name":"2018 5th International Conference on Systems and Informatics (ICSAI)","volume":"103 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115489237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-11-01DOI: 10.1109/ICSAI.2018.8599288
Youmin Zhang, Li Liu, Shun Fu, Fujin Zhong
Entity alignment across knowledge graphs is an important task in web mining. The aligned entities can be used for transferring knowledge across knowledge graphs and benefit several tasks such as cross-lingual knowledge graph construction and knowledge reasoning. This paper propose a representation learning based algorithm for embedding knowledge graph and aligning entities. In particular, considering the multi-type relations in knowledge graph, we select the alignment-task driven representative relations based on the pre-aligned entity pairs. With the help of selected relations, we embed the entities across networks into a common space by modeling entities’ head/tail are with corresponding context vectors. For entity alignment task, pre-aligned entities are adopted to facilitate the transfer of context information across the knowledges graphs. Through this way, the problem of entity embedding and alignment can be solved simultaneously under a unified framework.. Extensive experiments on two multi-lingual knowledge graphs demonstrate the effectiveness of the proposed model comparing with several state-of-the-art models.
{"title":"Entity Alignment Across Knowledge Graphs Based on Representative Relations Selection","authors":"Youmin Zhang, Li Liu, Shun Fu, Fujin Zhong","doi":"10.1109/ICSAI.2018.8599288","DOIUrl":"https://doi.org/10.1109/ICSAI.2018.8599288","url":null,"abstract":"Entity alignment across knowledge graphs is an important task in web mining. The aligned entities can be used for transferring knowledge across knowledge graphs and benefit several tasks such as cross-lingual knowledge graph construction and knowledge reasoning. This paper propose a representation learning based algorithm for embedding knowledge graph and aligning entities. In particular, considering the multi-type relations in knowledge graph, we select the alignment-task driven representative relations based on the pre-aligned entity pairs. With the help of selected relations, we embed the entities across networks into a common space by modeling entities’ head/tail are with corresponding context vectors. For entity alignment task, pre-aligned entities are adopted to facilitate the transfer of context information across the knowledges graphs. Through this way, the problem of entity embedding and alignment can be solved simultaneously under a unified framework.. Extensive experiments on two multi-lingual knowledge graphs demonstrate the effectiveness of the proposed model comparing with several state-of-the-art models.","PeriodicalId":375852,"journal":{"name":"2018 5th International Conference on Systems and Informatics (ICSAI)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123640822","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-11-01DOI: 10.1109/ICSAI.2018.8599447
Min Xiao, Mingzi Xiao, Jing Liang, Yan Shi
this paper introduced a temperature controlling system designed for agricultural greenhouses by making use of the ZigBee technology. DS18B20, the in-line temperature sensor was used for temperature collection, which can be transformed into a digit directly. In wireless communication, ZigBee, the protocol stack was used for relevant modification on the application layer so as to implement wireless communication over the ZigBee protocol. The whole temperature collection process and wireless communication were all completed by the CC2530 functional node module by IAR Embedded Workbench, which was an integrated development environment. One functional node of CC2530 was designed to collect temperature and send it as a terminal node. The other functional node of CC2530 was responsible for receiving temperature as a coordinator and transmit the temperature to the STM32F103RBT6 development board by the serial port communication technology. The collected temperature can be shown on the 3.2-inch TFT LCD screen and the display drive can be completed on the STM32F103RBT6 board, on which the rotation of the stepping motor can be driven. The whole development process was implemented by the MDK software developed by Keil. The data can be transmitted between the CC2530 functional node module and the STM32F103RBT6 development board by the serial communication technology. Ultimately, the system test indicated that data can be transmitted correctly and the transmission is stable so that the greenhouse management requirement can be met.
{"title":"ZigBee-based Temperature Controlling System for Agricultural Greenhouses","authors":"Min Xiao, Mingzi Xiao, Jing Liang, Yan Shi","doi":"10.1109/ICSAI.2018.8599447","DOIUrl":"https://doi.org/10.1109/ICSAI.2018.8599447","url":null,"abstract":"this paper introduced a temperature controlling system designed for agricultural greenhouses by making use of the ZigBee technology. DS18B20, the in-line temperature sensor was used for temperature collection, which can be transformed into a digit directly. In wireless communication, ZigBee, the protocol stack was used for relevant modification on the application layer so as to implement wireless communication over the ZigBee protocol. The whole temperature collection process and wireless communication were all completed by the CC2530 functional node module by IAR Embedded Workbench, which was an integrated development environment. One functional node of CC2530 was designed to collect temperature and send it as a terminal node. The other functional node of CC2530 was responsible for receiving temperature as a coordinator and transmit the temperature to the STM32F103RBT6 development board by the serial port communication technology. The collected temperature can be shown on the 3.2-inch TFT LCD screen and the display drive can be completed on the STM32F103RBT6 board, on which the rotation of the stepping motor can be driven. The whole development process was implemented by the MDK software developed by Keil. The data can be transmitted between the CC2530 functional node module and the STM32F103RBT6 development board by the serial communication technology. Ultimately, the system test indicated that data can be transmitted correctly and the transmission is stable so that the greenhouse management requirement can be met.","PeriodicalId":375852,"journal":{"name":"2018 5th International Conference on Systems and Informatics (ICSAI)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125442875","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-11-01DOI: 10.1109/ICSAI.2018.8599375
C. R. Haruna, Mengshu Hou, M. J. Eghan, Michael Y. Kpiebaareh, Lawrence Tandoh, Barbie Eghan-Yartel, Maame G. Asante-Mensah
In real world, databases often have several records representing the same entity and these duplicates have no common key, thus making deduplication difficult. Machine-based and crowdsourcing techniques were disjointly used in improving quality in data deduplication. Crowdsourcing were used for solving tasks that the machine-based algorithms were not good at. Though, the crowds, compared with machines, provided relatively more accurate results, both platforms were slow in execution and hence expensive to implement. In this paper, a hybrid human-machine system was proposed where machines were firstly used on the data set before the humans were further used to identify potential duplicates. We performed experiments using three benchmark datasets; paper, restaurant and product datasets. Our algorithm was compared with some existing techniques and our approach outperformed some methods by achieving a high accuracy of deduplication and good deduplication efficiency while incurring low crowdsourcing costs.
{"title":"Cost-Based and Effective Human-Machine Based Data Deduplication Model in Entity Reconciliation","authors":"C. R. Haruna, Mengshu Hou, M. J. Eghan, Michael Y. Kpiebaareh, Lawrence Tandoh, Barbie Eghan-Yartel, Maame G. Asante-Mensah","doi":"10.1109/ICSAI.2018.8599375","DOIUrl":"https://doi.org/10.1109/ICSAI.2018.8599375","url":null,"abstract":"In real world, databases often have several records representing the same entity and these duplicates have no common key, thus making deduplication difficult. Machine-based and crowdsourcing techniques were disjointly used in improving quality in data deduplication. Crowdsourcing were used for solving tasks that the machine-based algorithms were not good at. Though, the crowds, compared with machines, provided relatively more accurate results, both platforms were slow in execution and hence expensive to implement. In this paper, a hybrid human-machine system was proposed where machines were firstly used on the data set before the humans were further used to identify potential duplicates. We performed experiments using three benchmark datasets; paper, restaurant and product datasets. Our algorithm was compared with some existing techniques and our approach outperformed some methods by achieving a high accuracy of deduplication and good deduplication efficiency while incurring low crowdsourcing costs.","PeriodicalId":375852,"journal":{"name":"2018 5th International Conference on Systems and Informatics (ICSAI)","volume":"178 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124435315","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}