Anthony B Garza, Rolando Garcia, Marc S Halfon, Hani Z Girgis
{"title":"Evaluation of metric and representation learning approaches: Effects of representations driven by relative distance on the performance.","authors":"Anthony B Garza, Rolando Garcia, Marc S Halfon, Hani Z Girgis","doi":"10.1109/imsa58542.2023.10217475","DOIUrl":null,"url":null,"abstract":"<p><p>Several deep neural network architectures have emerged recently for metric learning. We asked which architecture is the most effective in measuring the similarity or dissimilarity among images. To this end, we evaluated six networks on a standard image set. We evaluated variational autoencoders, Siamese networks, triplet networks, and variational auto-encoders combined with Siamese or triplet networks. These networks were compared to a baseline network consisting of multiple separable convolutional layers. Our study revealed the following: (i) the triplet architecture proved the most effective one due to learning a relative distance - not an absolute distance; (ii) combining auto-encoders with networks that learn metrics (e.g., Siamese or triplet networks) is unwarranted; and (iii) an architecture based on separable convolutional layers is a reasonable simple alternative to triplet networks. These results can potentially impact our field by encouraging architects to develop advanced networks that take advantage of separable convolution and relative distance.</p>","PeriodicalId":94364,"journal":{"name":"2023 Intelligent Methods, Systems, and Applications","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10566582/pdf/nihms-1935619.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 Intelligent Methods, Systems, and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/imsa58542.2023.10217475","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2023/8/24 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Several deep neural network architectures have emerged recently for metric learning. We asked which architecture is the most effective in measuring the similarity or dissimilarity among images. To this end, we evaluated six networks on a standard image set. We evaluated variational autoencoders, Siamese networks, triplet networks, and variational auto-encoders combined with Siamese or triplet networks. These networks were compared to a baseline network consisting of multiple separable convolutional layers. Our study revealed the following: (i) the triplet architecture proved the most effective one due to learning a relative distance - not an absolute distance; (ii) combining auto-encoders with networks that learn metrics (e.g., Siamese or triplet networks) is unwarranted; and (iii) an architecture based on separable convolutional layers is a reasonable simple alternative to triplet networks. These results can potentially impact our field by encouraging architects to develop advanced networks that take advantage of separable convolution and relative distance.