Pub Date : 2017-01-01DOI: 10.1109/CONFLUENCE.2017.7943228
K. Dang, Shanu Sharma
With the tremendous increase in video and image database there is a great need of automatic understanding and examination of data by the intelligent systems as manually it is becoming out of reach. Narrowing it down to one specific domain, one of the most specific objects that can be traced in the images are people i.e. faces. Face detection is becoming a challenge by its increasing use in number of applications. It is the first step for face recognition, face analysis and detection of other features of face. In this paper, various face detection algorithms are discussed and analyzed like Viola-Jones, SMQT features & SNOW Classifier, Neural Network-Based Face Detection and Support Vector Machine-Based face detection. All these face detection methods are compared based on the precision and recall value calculated using a DetEval Software which deals with precised values of the bounding boxes around the faces to give accurate results.
{"title":"Review and comparison of face detection algorithms","authors":"K. Dang, Shanu Sharma","doi":"10.1109/CONFLUENCE.2017.7943228","DOIUrl":"https://doi.org/10.1109/CONFLUENCE.2017.7943228","url":null,"abstract":"With the tremendous increase in video and image database there is a great need of automatic understanding and examination of data by the intelligent systems as manually it is becoming out of reach. Narrowing it down to one specific domain, one of the most specific objects that can be traced in the images are people i.e. faces. Face detection is becoming a challenge by its increasing use in number of applications. It is the first step for face recognition, face analysis and detection of other features of face. In this paper, various face detection algorithms are discussed and analyzed like Viola-Jones, SMQT features & SNOW Classifier, Neural Network-Based Face Detection and Support Vector Machine-Based face detection. All these face detection methods are compared based on the precision and recall value calculated using a DetEval Software which deals with precised values of the bounding boxes around the faces to give accurate results.","PeriodicalId":6651,"journal":{"name":"2017 7th International Conference on Cloud Computing, Data Science & Engineering - Confluence","volume":"10 1","pages":"629-633"},"PeriodicalIF":0.0,"publicationDate":"2017-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78949971","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-01-01DOI: 10.1109/CONFLUENCE.2017.7943175
Annu Kumari, S. Singh
Do one really know the meaning of Online Influence (OI) Maximization? Do you know why do we need to calculate the influence of social networking sites? How to measure the influence of online maximization? How does it really works? If you have been pondering for the answers of the questions then this paper will assist you to identify and connect with influences of online maximization. Influence Maximization is the problem in which subset of seed nodes are found within the social networks which maximizes influence on other nodes in their ties and relationships. Influence maximization has been developed to find out how influence gets propagated through its network. The concept of Influence Maximization lies in the selection of minimal set of seed nodes which propagates maximum of its influenciality within a network. This paper firstly comprises of previous models used in appropriate selection of seed nodes i.e. Linearly Threshold Model (LT), Classic Cascade Independent s Model(IC), Extended Classic Independent Model(EIC). Then, I proposed a new Model Rapid Continuous Time (RCT) Independent Cascade Model that can be used in the Classic Independent Model(IC).
{"title":"Online influence maximization using rapid continuous time independent cascade model","authors":"Annu Kumari, S. Singh","doi":"10.1109/CONFLUENCE.2017.7943175","DOIUrl":"https://doi.org/10.1109/CONFLUENCE.2017.7943175","url":null,"abstract":"Do one really know the meaning of Online Influence (OI) Maximization? Do you know why do we need to calculate the influence of social networking sites? How to measure the influence of online maximization? How does it really works? If you have been pondering for the answers of the questions then this paper will assist you to identify and connect with influences of online maximization. Influence Maximization is the problem in which subset of seed nodes are found within the social networks which maximizes influence on other nodes in their ties and relationships. Influence maximization has been developed to find out how influence gets propagated through its network. The concept of Influence Maximization lies in the selection of minimal set of seed nodes which propagates maximum of its influenciality within a network. This paper firstly comprises of previous models used in appropriate selection of seed nodes i.e. Linearly Threshold Model (LT), Classic Cascade Independent s Model(IC), Extended Classic Independent Model(EIC). Then, I proposed a new Model Rapid Continuous Time (RCT) Independent Cascade Model that can be used in the Classic Independent Model(IC).","PeriodicalId":6651,"journal":{"name":"2017 7th International Conference on Cloud Computing, Data Science & Engineering - Confluence","volume":"35 6 1","pages":"356-361"},"PeriodicalIF":0.0,"publicationDate":"2017-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77063718","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-01-01DOI: 10.1109/CONFLUENCE.2017.7943160
Anil Bamwal
This paper defines how a distributed cloud computing system is used to define different efficient scientific applications. Generally a distributed cloud computing system is formed from a number of different Infrastructure-as-a-Service (IaaS) clouds which are used in an integrated infrastructure. The production of distributed cloud is continuing since last four years with around 800,000 numbers of completed jobs with an average of about 500 simultaneous and parallel jobs which are executed for about 12 hours per day. Here the design and implementation of system is reviewed based on some custom components and a number of pre-existing components. In this paper the various operations of the system, plans for increasing the computing capacity and expansion to more number of sites is discussed.
{"title":"Efficient management of distributive cloud computing system for scientific applications","authors":"Anil Bamwal","doi":"10.1109/CONFLUENCE.2017.7943160","DOIUrl":"https://doi.org/10.1109/CONFLUENCE.2017.7943160","url":null,"abstract":"This paper defines how a distributed cloud computing system is used to define different efficient scientific applications. Generally a distributed cloud computing system is formed from a number of different Infrastructure-as-a-Service (IaaS) clouds which are used in an integrated infrastructure. The production of distributed cloud is continuing since last four years with around 800,000 numbers of completed jobs with an average of about 500 simultaneous and parallel jobs which are executed for about 12 hours per day. Here the design and implementation of system is reviewed based on some custom components and a number of pre-existing components. In this paper the various operations of the system, plans for increasing the computing capacity and expansion to more number of sites is discussed.","PeriodicalId":6651,"journal":{"name":"2017 7th International Conference on Cloud Computing, Data Science & Engineering - Confluence","volume":"1 1","pages":"262-268"},"PeriodicalIF":0.0,"publicationDate":"2017-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75246469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-01-01DOI: 10.1109/CONFLUENCE.2017.7943250
Sanjay Chouhan, Manish Gupta, D. K. Panda
The paper presented a design of 2X1 antenna for dual band WLAN application with coupling element to improve the mutual coupling. Coupling structure is located between two patch antennas. The designed structure is consisting of a series of hexagonal cut to shrink the size of the patch antenna. A proposed coupling structure is used to improve the value isolation between two closely placed antenna elements. The 2×1 MIMO antennas for the frequency band 2.4 GHz and 5.2 GHz is presented and simulated with coupling and without coupling element. The proposed design uses a FR-4 material with the height 1.524 mm. The size of the single patch antenna is 16 X16 mm. Simulation result presented the improvement in isolation from without coupling elements to with coupling element. 15dB and 2 dB isolation improvement found with coupling element for frequency band 2.4 GHz and 5.2 GHz correspondingly.
{"title":"Dual band compact antenna with series of hexagonal cut and coupling structure for isolation enhancement","authors":"Sanjay Chouhan, Manish Gupta, D. K. Panda","doi":"10.1109/CONFLUENCE.2017.7943250","DOIUrl":"https://doi.org/10.1109/CONFLUENCE.2017.7943250","url":null,"abstract":"The paper presented a design of 2X1 antenna for dual band WLAN application with coupling element to improve the mutual coupling. Coupling structure is located between two patch antennas. The designed structure is consisting of a series of hexagonal cut to shrink the size of the patch antenna. A proposed coupling structure is used to improve the value isolation between two closely placed antenna elements. The 2×1 MIMO antennas for the frequency band 2.4 GHz and 5.2 GHz is presented and simulated with coupling and without coupling element. The proposed design uses a FR-4 material with the height 1.524 mm. The size of the single patch antenna is 16 X16 mm. Simulation result presented the improvement in isolation from without coupling elements to with coupling element. 15dB and 2 dB isolation improvement found with coupling element for frequency band 2.4 GHz and 5.2 GHz correspondingly.","PeriodicalId":6651,"journal":{"name":"2017 7th International Conference on Cloud Computing, Data Science & Engineering - Confluence","volume":"7 1","pages":"750-753"},"PeriodicalIF":0.0,"publicationDate":"2017-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76295533","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-01-01DOI: 10.1109/CONFLUENCE.2017.7943158
Meetu Kandpal, Monica Gahlawat, Kalyani Patel
In the era of Big Data analytics predictive modeling plays an important role to predict the future demand and behavior by using historical data. As majority of the IT companies running behind cloud services, the cloud service providers like Amazon, Google cloud, Microsoft Azure etc may be interested to know the future demand of the computing resources so that they can derive new pricing schemes to gain more profit. The providers have different pricing schemes to charge for computing resourcese. g., Amazon provides three pricing schemes, namely, on-demand pricing, reserved pricing and auction pricing in the same way Microsoft has different schemes like Pay-As-You-Go Subscriptions, Prepaid Subscriptions. The paper presents survey of role of predictive modeling in cloud service pricing. The survey result clearly shows that predictions made by various author are closer to actual outcomes, which highlights the importance of predictive modeling to forecast future demand of cloud computing resources and deciding the price of resources.
{"title":"Role of predictive modeling in cloud services pricing: A survey","authors":"Meetu Kandpal, Monica Gahlawat, Kalyani Patel","doi":"10.1109/CONFLUENCE.2017.7943158","DOIUrl":"https://doi.org/10.1109/CONFLUENCE.2017.7943158","url":null,"abstract":"In the era of Big Data analytics predictive modeling plays an important role to predict the future demand and behavior by using historical data. As majority of the IT companies running behind cloud services, the cloud service providers like Amazon, Google cloud, Microsoft Azure etc may be interested to know the future demand of the computing resources so that they can derive new pricing schemes to gain more profit. The providers have different pricing schemes to charge for computing resourcese. g., Amazon provides three pricing schemes, namely, on-demand pricing, reserved pricing and auction pricing in the same way Microsoft has different schemes like Pay-As-You-Go Subscriptions, Prepaid Subscriptions. The paper presents survey of role of predictive modeling in cloud service pricing. The survey result clearly shows that predictions made by various author are closer to actual outcomes, which highlights the importance of predictive modeling to forecast future demand of cloud computing resources and deciding the price of resources.","PeriodicalId":6651,"journal":{"name":"2017 7th International Conference on Cloud Computing, Data Science & Engineering - Confluence","volume":"267 1","pages":"249-254"},"PeriodicalIF":0.0,"publicationDate":"2017-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79810869","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-01-01DOI: 10.1109/CONFLUENCE.2017.7943186
Daniel Fraunholz, Marc Zimmermann, S. D. Antón, Jörg Schneider, H. Dieter Schotten
Recently, the increase of interconnectivity has led to a rising amount of IoT enabled devices in botnets. Such botnets are currently used for large scale DDoS attacks. To keep track with these malicious activities, Honeypots have proven to be a vital tool. We developed and set up a distributed and highly-scalable WAN Honeypot with an attached backend infrastructure for sophisticated processing of the gathered data. For the processed data to be understandable we designed a graphical frontend that displays all relevant information that has been obtained from the data. We group attacks originating in a short period of time in one source as sessions. This enriches the data and enables a more in-depth analysis. We produced common statistics like usernames, passwords, username/password combinations, password lengths, originating country and more. From the information gathered, we were able to identify common dictionaries used for brute-force login attacks and other more sophisticated statistics like login attempts per session and attack efficiency.
{"title":"Distributed and highly-scalable WAN network attack sensing and sophisticated analysing framework based on Honeypot technology","authors":"Daniel Fraunholz, Marc Zimmermann, S. D. Antón, Jörg Schneider, H. Dieter Schotten","doi":"10.1109/CONFLUENCE.2017.7943186","DOIUrl":"https://doi.org/10.1109/CONFLUENCE.2017.7943186","url":null,"abstract":"Recently, the increase of interconnectivity has led to a rising amount of IoT enabled devices in botnets. Such botnets are currently used for large scale DDoS attacks. To keep track with these malicious activities, Honeypots have proven to be a vital tool. We developed and set up a distributed and highly-scalable WAN Honeypot with an attached backend infrastructure for sophisticated processing of the gathered data. For the processed data to be understandable we designed a graphical frontend that displays all relevant information that has been obtained from the data. We group attacks originating in a short period of time in one source as sessions. This enriches the data and enables a more in-depth analysis. We produced common statistics like usernames, passwords, username/password combinations, password lengths, originating country and more. From the information gathered, we were able to identify common dictionaries used for brute-force login attacks and other more sophisticated statistics like login attempts per session and attack efficiency.","PeriodicalId":6651,"journal":{"name":"2017 7th International Conference on Cloud Computing, Data Science & Engineering - Confluence","volume":"27 1","pages":"416-421"},"PeriodicalIF":0.0,"publicationDate":"2017-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83303294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-01-01DOI: 10.1109/CONFLUENCE.2017.7943185
Rajesh Mishra, Bhupendra Singh, Shobhna Tiwari
Safe operation of rail vehicles is a matter of concern. Communication based train control (CBTC) network is a data communication based automated control network that ensures safety of rail vehicles. In this network status and control command are transmitted using WLAN technology. It has been observed that WLAN is less successful for high speed as because of its design constrains packet drops cannot be avoided. In present work, analysis of random packet drops in CBTC systems during handover process is evaluated. Unlike the existing work that only consider packet drop formulation under specific condition, we analyze system behavior under worst case scenario by varying one parameter over the range and analyze its impact on the packet drop and other related parameters of CBTC system. Simulation results are presented and compared against existing results under specific conditions.
{"title":"Performance improvement of communication based high speed train control system with packet drops during handover under worst case scenario","authors":"Rajesh Mishra, Bhupendra Singh, Shobhna Tiwari","doi":"10.1109/CONFLUENCE.2017.7943185","DOIUrl":"https://doi.org/10.1109/CONFLUENCE.2017.7943185","url":null,"abstract":"Safe operation of rail vehicles is a matter of concern. Communication based train control (CBTC) network is a data communication based automated control network that ensures safety of rail vehicles. In this network status and control command are transmitted using WLAN technology. It has been observed that WLAN is less successful for high speed as because of its design constrains packet drops cannot be avoided. In present work, analysis of random packet drops in CBTC systems during handover process is evaluated. Unlike the existing work that only consider packet drop formulation under specific condition, we analyze system behavior under worst case scenario by varying one parameter over the range and analyze its impact on the packet drop and other related parameters of CBTC system. Simulation results are presented and compared against existing results under specific conditions.","PeriodicalId":6651,"journal":{"name":"2017 7th International Conference on Cloud Computing, Data Science & Engineering - Confluence","volume":"47 1","pages":"412-415"},"PeriodicalIF":0.0,"publicationDate":"2017-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73496775","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-01-01DOI: 10.1109/CONFLUENCE.2017.7943224
Siddhant Bansal, Garima Mehta
The security of multimedia content such as images while transmission is a cause of concern in current times. Traditional watermarking techniques help in identification of source as well as maintaining patient metadata for biomedical images. Similarly, traditional image encryption techniques allow privacy of patients. There is a need for a two-layer security approach with joint watermarking and encryption, to improve over contemporary methods. This paper presents comparative analysis of various joint encryption and watermarking algorithms of biomedical images to fin d the best pair of algorithms based on previous research. Comparative results also indicate that joint encryption and watermarking algorithms are suitable for security of biomedical images.
{"title":"Comparative analysis of joint encryption and watermarking algorithms for security of biomedical images","authors":"Siddhant Bansal, Garima Mehta","doi":"10.1109/CONFLUENCE.2017.7943224","DOIUrl":"https://doi.org/10.1109/CONFLUENCE.2017.7943224","url":null,"abstract":"The security of multimedia content such as images while transmission is a cause of concern in current times. Traditional watermarking techniques help in identification of source as well as maintaining patient metadata for biomedical images. Similarly, traditional image encryption techniques allow privacy of patients. There is a need for a two-layer security approach with joint watermarking and encryption, to improve over contemporary methods. This paper presents comparative analysis of various joint encryption and watermarking algorithms of biomedical images to fin d the best pair of algorithms based on previous research. Comparative results also indicate that joint encryption and watermarking algorithms are suitable for security of biomedical images.","PeriodicalId":6651,"journal":{"name":"2017 7th International Conference on Cloud Computing, Data Science & Engineering - Confluence","volume":"615 1","pages":"609-612"},"PeriodicalIF":0.0,"publicationDate":"2017-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77362456","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-01-01DOI: 10.1109/CONFLUENCE.2017.7943207
Uma Ojha, Savita Goel
Breast cancer is the most common cancer in women and thus the early stage detection in breast cancer can provide potential advantage in the treatment of this disease. Early treatment not only helps to cure cancer but also helps in its prevention of its recurrence. Data mining algorithms can provide great assistance in prediction of earl y stage breast cancer that always has been a challenging research problem. The main objective of this research is to find how precisely can these data mining algorithms predict the probability of recurrence of the disease among the patients on the basis of important stated parameters. The research highlights the performance of different clustering and classification algorithms on the dataset. Experiments show that classification algorithms are better predictors than clustering algorithms. The result indicates that the decision tree (C5.0) and SVM is the best predictor with 81% accuracy on the holdout sample and fuzzy c-means came with the lowest accuracy of37% among the algorithms used in this paper.
{"title":"A study on prediction of breast cancer recurrence using data mining techniques","authors":"Uma Ojha, Savita Goel","doi":"10.1109/CONFLUENCE.2017.7943207","DOIUrl":"https://doi.org/10.1109/CONFLUENCE.2017.7943207","url":null,"abstract":"Breast cancer is the most common cancer in women and thus the early stage detection in breast cancer can provide potential advantage in the treatment of this disease. Early treatment not only helps to cure cancer but also helps in its prevention of its recurrence. Data mining algorithms can provide great assistance in prediction of earl y stage breast cancer that always has been a challenging research problem. The main objective of this research is to find how precisely can these data mining algorithms predict the probability of recurrence of the disease among the patients on the basis of important stated parameters. The research highlights the performance of different clustering and classification algorithms on the dataset. Experiments show that classification algorithms are better predictors than clustering algorithms. The result indicates that the decision tree (C5.0) and SVM is the best predictor with 81% accuracy on the holdout sample and fuzzy c-means came with the lowest accuracy of37% among the algorithms used in this paper.","PeriodicalId":6651,"journal":{"name":"2017 7th International Conference on Cloud Computing, Data Science & Engineering - Confluence","volume":"61 1","pages":"527-530"},"PeriodicalIF":0.0,"publicationDate":"2017-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88489020","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-01-01DOI: 10.1109/CONFLUENCE.2017.7943119
Rivindu Perera, P. Nand, Wen-Hsin Yang, Kohichi Toshioka
The consumption of Linked Data has dramatically increased with the increasing momentum towards semantic web. Linked data is essentially a very simplistic format for representation of knowledge in that all the knowledge is represented as triples which can be linked using one or more components from the triple. To date, most of the efforts has been towards either creating linked data by mining the web or making it available for users as a source of knowledgebase for knowledge engineering applications. In recent times there has been a growing need for these applications to interact with users in a natural language which required the transformation of the linked data knowledge into a natural language. The aim of the RealText project described in this paper, is to build a scalable framework to transform Linked Data into natural language by generating lexicalization patterns for triples. A lexicalization pattern is a syntactical pattern that will transform a given triple into a syntactically correct natural language sentence. Using DBpedia as the Linked Data resource, we have generated 283 accurate lexicalization patterns for a sample set of 25 ontology classes. We performed human evaluation on a test sub-sample with an inter-rater agreement of 0.86 and 0.80 for readability and accuracy respectively. This results showed that the lexicalization patterns generated language that are accurate, readable and emanates qualities of a human produced language.
{"title":"Lexicalizing linked data for a human friendly web","authors":"Rivindu Perera, P. Nand, Wen-Hsin Yang, Kohichi Toshioka","doi":"10.1109/CONFLUENCE.2017.7943119","DOIUrl":"https://doi.org/10.1109/CONFLUENCE.2017.7943119","url":null,"abstract":"The consumption of Linked Data has dramatically increased with the increasing momentum towards semantic web. Linked data is essentially a very simplistic format for representation of knowledge in that all the knowledge is represented as triples which can be linked using one or more components from the triple. To date, most of the efforts has been towards either creating linked data by mining the web or making it available for users as a source of knowledgebase for knowledge engineering applications. In recent times there has been a growing need for these applications to interact with users in a natural language which required the transformation of the linked data knowledge into a natural language. The aim of the RealText project described in this paper, is to build a scalable framework to transform Linked Data into natural language by generating lexicalization patterns for triples. A lexicalization pattern is a syntactical pattern that will transform a given triple into a syntactically correct natural language sentence. Using DBpedia as the Linked Data resource, we have generated 283 accurate lexicalization patterns for a sample set of 25 ontology classes. We performed human evaluation on a test sub-sample with an inter-rater agreement of 0.86 and 0.80 for readability and accuracy respectively. This results showed that the lexicalization patterns generated language that are accurate, readable and emanates qualities of a human produced language.","PeriodicalId":6651,"journal":{"name":"2017 7th International Conference on Cloud Computing, Data Science & Engineering - Confluence","volume":"29 1","pages":"30-35"},"PeriodicalIF":0.0,"publicationDate":"2017-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91540813","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}