{"title":"Finite Approximations and Similarity of Languages","authors":"B. Rovan, A. Varga","doi":"10.1142/s0129054122500113","DOIUrl":null,"url":null,"abstract":"A new framework to measure distances (similarity) between formal languages and between grammars based on distances between words is introduced. It is based on approximating languages by their finite subsets and using monotone sequences of such finite approximations to define an infinite language in the limit. Distances between finite languages are defined and extended to distances between monotone sequences of finite languages leading to distances between infinite languages. The framework captures several distances studied in the literature. Context-free grammars with energy are introduced to enable finite approximations emphasizing “syntactically important” parts of words. Grammars with energy are also used to extend distances between monotone sequences of finite languages to distances between context-free grammars. A basic toolkit for monotone sequences of finite languages and distances between languages resp. grammars is provided. As part of this toolkit a non-symmetric version of distances is defined, providing additional characterisation of distances in general. Additional properties of distances between grammars are derived by restricting the“energy use” of grammars with energy. Some methods of estimating the distances are presented to be used in cases where the distance is not computable or difficult to compute.","PeriodicalId":192109,"journal":{"name":"Int. J. Found. Comput. Sci.","volume":"30 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Int. J. Found. Comput. Sci.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1142/s0129054122500113","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
A new framework to measure distances (similarity) between formal languages and between grammars based on distances between words is introduced. It is based on approximating languages by their finite subsets and using monotone sequences of such finite approximations to define an infinite language in the limit. Distances between finite languages are defined and extended to distances between monotone sequences of finite languages leading to distances between infinite languages. The framework captures several distances studied in the literature. Context-free grammars with energy are introduced to enable finite approximations emphasizing “syntactically important” parts of words. Grammars with energy are also used to extend distances between monotone sequences of finite languages to distances between context-free grammars. A basic toolkit for monotone sequences of finite languages and distances between languages resp. grammars is provided. As part of this toolkit a non-symmetric version of distances is defined, providing additional characterisation of distances in general. Additional properties of distances between grammars are derived by restricting the“energy use” of grammars with energy. Some methods of estimating the distances are presented to be used in cases where the distance is not computable or difficult to compute.