{"title":"A new clustering approach to identify the values to query the deep web access forms","authors":"Yasser Saissi, A. Zellou, A. Idri","doi":"10.1109/CATA.2018.8398666","DOIUrl":null,"url":null,"abstract":"The deep web is a huge part of the web only accessible by querying its access forms. To query these access forms, we need to know the possible values of each form field. But, some form fields have an undefined set of values and this makes their automatic query difficult or impossible. In this paper, we propose our new approach to identify the set of the possible values for these fields to query the deep web access forms. For this, we query first these fields with the values associated with the domain of the deep web source. After, we use the K-medoids clustering approach to classify these generated results in a K clusters. For this, our clustering approach uses the semantic similarity between these results. The elements of the generated clusters are used by our approach to define the set of the possible values of these analyzed fields. With this approach, we can apply efficient queries to all the fields of the deep web access forms and access the deep web information.","PeriodicalId":231024,"journal":{"name":"2018 4th International Conference on Computer and Technology Applications (ICCTA)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 4th International Conference on Computer and Technology Applications (ICCTA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CATA.2018.8398666","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
The deep web is a huge part of the web only accessible by querying its access forms. To query these access forms, we need to know the possible values of each form field. But, some form fields have an undefined set of values and this makes their automatic query difficult or impossible. In this paper, we propose our new approach to identify the set of the possible values for these fields to query the deep web access forms. For this, we query first these fields with the values associated with the domain of the deep web source. After, we use the K-medoids clustering approach to classify these generated results in a K clusters. For this, our clustering approach uses the semantic similarity between these results. The elements of the generated clusters are used by our approach to define the set of the possible values of these analyzed fields. With this approach, we can apply efficient queries to all the fields of the deep web access forms and access the deep web information.