{"title":"Review of Recent Distillation Studies","authors":"Min Gao","doi":"10.1051/matecconf/202338201034","DOIUrl":null,"url":null,"abstract":"Knowledge distillation has gained a lot of interest in recent years because it allows for compressing a large deep neural network (teacher DNN) into a smaller DNN (student DNN), while maintaining its accuracy. Recent improvements have been made to knowledge distillation. One such improvement is the teaching assistant distillation method. This method involves introducing an intermediate \"teaching assistant\" model between the teacher and student. The teaching assistant is first trained to mimic the teacher, and then the student is trained to mimic the teaching assistant. This multi-step process can improve student performance. Another improvement to knowledge distillation is curriculum distillation. This method involves gradually training the student by exposing it to increasingly difficult concepts over time, similar to curriculum learning in humans. This process can help the student learn in a more stable and consistent manner. Finally, there is the mask distillation method. Here, the student is trained to specifically mimic the attention mechanisms learned by the teacher, not just the overall output of the teacher DNN. These improvements help to enhance the knowledge distillation process and enable the creation of more efficient DNNs.","PeriodicalId":18309,"journal":{"name":"MATEC Web of Conferences","volume":"137 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"MATEC Web of Conferences","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1051/matecconf/202338201034","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Knowledge distillation has gained a lot of interest in recent years because it allows for compressing a large deep neural network (teacher DNN) into a smaller DNN (student DNN), while maintaining its accuracy. Recent improvements have been made to knowledge distillation. One such improvement is the teaching assistant distillation method. This method involves introducing an intermediate "teaching assistant" model between the teacher and student. The teaching assistant is first trained to mimic the teacher, and then the student is trained to mimic the teaching assistant. This multi-step process can improve student performance. Another improvement to knowledge distillation is curriculum distillation. This method involves gradually training the student by exposing it to increasingly difficult concepts over time, similar to curriculum learning in humans. This process can help the student learn in a more stable and consistent manner. Finally, there is the mask distillation method. Here, the student is trained to specifically mimic the attention mechanisms learned by the teacher, not just the overall output of the teacher DNN. These improvements help to enhance the knowledge distillation process and enable the creation of more efficient DNNs.
期刊介绍:
MATEC Web of Conferences is an Open Access publication series dedicated to archiving conference proceedings dealing with all fundamental and applied research aspects related to Materials science, Engineering and Chemistry. All engineering disciplines are covered by the aims and scope of the journal: civil, naval, mechanical, chemical, and electrical engineering as well as nanotechnology and metrology. The journal concerns also all materials in regard to their physical-chemical characterization, implementation, resistance in their environment… Other subdisciples of chemistry, such as analytical chemistry, petrochemistry, organic chemistry…, and even pharmacology, are also welcome. MATEC Web of Conferences offers a wide range of services from the organization of the submission of conference proceedings to the worldwide dissemination of the conference papers. It provides an efficient archiving solution, ensuring maximum exposure and wide indexing of scientific conference proceedings. Proceedings are published under the scientific responsibility of the conference editors.