Thomas Vial, Farah Dhouib, Louison Roger, Annabelle Blangero, Frédéric Duvivier, Karim Sayadi, Marisa N. Faraggi
{"title":"Dimensionality reduction to improve search time and memory footprint in content-retrieval tasks: Application to semiconductor inspection images","authors":"Thomas Vial, Farah Dhouib, Louison Roger, Annabelle Blangero, Frédéric Duvivier, Karim Sayadi, Marisa N. Faraggi","doi":"10.1016/j.aime.2022.100097","DOIUrl":null,"url":null,"abstract":"<div><p>Quality control in semiconductors is a crucial step to produce high quality microchips. During the last years, advances in artificial vision have significantly improved image quality control techniques. In the semiconductor industry, automated visual inspection is fundamental to avoid human intervention and keep the pipeline sanitized. Different types of images are collected during this process, feeding image databases that continually grow and cannot be labelled by humans in an exhaustive manner. Advances in image retrieval search methods are fundamental to develop more efficient techniques that meet user requirements.</p><p>In this work we propose a dimensionality reduction approach on the feature vectors computed by a classifying deep learning model, while keeping a high retrieval performance. To validate this technique, we evaluate four well-known reduction algorithms on a subset of the full database: Principal Component Analysis (PCA), Sparse Random Projection (SRP), Isomap, Locally Linear Embedding (LLE), in combination with three similarity metrics: Euclidian (<span><math><msub><mrow><mi>L</mi></mrow><mrow><mn>2</mn></mrow></msub></math></span>), cosine and inner product. As the number of components of the vectors is reduced, the performance of the image retrieval is measured by recall, time to search, and memory footprint of the database.</p><p>PCA offers the best results, allowing a significant reduction in search time and memory usage, while SRP becomes an option only when the cosine distance is used. With PCA, we were able to divide the memory footprint by a factor of 16, the search time by 6, while maintaining an average recall of 0.96.</p></div>","PeriodicalId":34573,"journal":{"name":"Advances in Industrial and Manufacturing Engineering","volume":"5 ","pages":"Article 100097"},"PeriodicalIF":3.9000,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666912922000241/pdfft?md5=8d1eb3351fc96b16fe2ebb5868cb4860&pid=1-s2.0-S2666912922000241-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Advances in Industrial and Manufacturing Engineering","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666912922000241","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, INDUSTRIAL","Score":null,"Total":0}
引用次数: 0
Abstract
Quality control in semiconductors is a crucial step to produce high quality microchips. During the last years, advances in artificial vision have significantly improved image quality control techniques. In the semiconductor industry, automated visual inspection is fundamental to avoid human intervention and keep the pipeline sanitized. Different types of images are collected during this process, feeding image databases that continually grow and cannot be labelled by humans in an exhaustive manner. Advances in image retrieval search methods are fundamental to develop more efficient techniques that meet user requirements.
In this work we propose a dimensionality reduction approach on the feature vectors computed by a classifying deep learning model, while keeping a high retrieval performance. To validate this technique, we evaluate four well-known reduction algorithms on a subset of the full database: Principal Component Analysis (PCA), Sparse Random Projection (SRP), Isomap, Locally Linear Embedding (LLE), in combination with three similarity metrics: Euclidian (), cosine and inner product. As the number of components of the vectors is reduced, the performance of the image retrieval is measured by recall, time to search, and memory footprint of the database.
PCA offers the best results, allowing a significant reduction in search time and memory usage, while SRP becomes an option only when the cosine distance is used. With PCA, we were able to divide the memory footprint by a factor of 16, the search time by 6, while maintaining an average recall of 0.96.