D. Benhaddouche, Mohamed Tekkouk, Abdelghani Chernnouf Youcef
{"title":"从维基百科中提取地理知识","authors":"D. Benhaddouche, Mohamed Tekkouk, Abdelghani Chernnouf Youcef","doi":"10.1145/3330089.3330128","DOIUrl":null,"url":null,"abstract":"GIS is becoming a necessity in a wide variety of application domains and the extraction of such geographic information has taken an important part in the computer science field. This thesis has the objective of extracting geographic data from Wikipedia to make it easier for users to obtain the information they want. One problematic aspect is the large volume XML file processing, we try to use text mining and machine learning techniques to solve this problem. In this work, we present and evaluate an approach to extract geographic data from Wikipedia from a very large XML file and create a geographic databae. Our technique is to extract infoboxes from geographic articles using the supervised machine learning (SVM) technique. We create after that tables containing geographic data (name, longitude, latitude ... etc) and we make the joins between different tables that will help us to structure our result.","PeriodicalId":251275,"journal":{"name":"Proceedings of the 7th International Conference on Software Engineering and New Technologies","volume":"51 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Extracting Geographic Knowledge from Wikipedia\",\"authors\":\"D. Benhaddouche, Mohamed Tekkouk, Abdelghani Chernnouf Youcef\",\"doi\":\"10.1145/3330089.3330128\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"GIS is becoming a necessity in a wide variety of application domains and the extraction of such geographic information has taken an important part in the computer science field. This thesis has the objective of extracting geographic data from Wikipedia to make it easier for users to obtain the information they want. One problematic aspect is the large volume XML file processing, we try to use text mining and machine learning techniques to solve this problem. In this work, we present and evaluate an approach to extract geographic data from Wikipedia from a very large XML file and create a geographic databae. Our technique is to extract infoboxes from geographic articles using the supervised machine learning (SVM) technique. We create after that tables containing geographic data (name, longitude, latitude ... etc) and we make the joins between different tables that will help us to structure our result.\",\"PeriodicalId\":251275,\"journal\":{\"name\":\"Proceedings of the 7th International Conference on Software Engineering and New Technologies\",\"volume\":\"51 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-12-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 7th International Conference on Software Engineering and New Technologies\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3330089.3330128\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 7th International Conference on Software Engineering and New Technologies","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3330089.3330128","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
GIS is becoming a necessity in a wide variety of application domains and the extraction of such geographic information has taken an important part in the computer science field. This thesis has the objective of extracting geographic data from Wikipedia to make it easier for users to obtain the information they want. One problematic aspect is the large volume XML file processing, we try to use text mining and machine learning techniques to solve this problem. In this work, we present and evaluate an approach to extract geographic data from Wikipedia from a very large XML file and create a geographic databae. Our technique is to extract infoboxes from geographic articles using the supervised machine learning (SVM) technique. We create after that tables containing geographic data (name, longitude, latitude ... etc) and we make the joins between different tables that will help us to structure our result.