论文阅读--A Comprehensive Overhaul of Feature Distillation Heo


We investigate the design aspects of feature distillation methods achieving network compression and propose a novel feature distillation method in which the distillation loss is designed to make a synergy among various aspects: teacher transform, student transform, distillation feature position and distance function. Our proposed distillation loss includes a feature transform with a newly designed margin ReLU, a new distillation feature position, and a partial L2 distance function to skip redundant information giving adverse effects to the compression of student. In ImageNet, our proposed method achieves 21.65% of top-1 error with ResNet50, which outperforms the performance of the teacher network, ResNet152. Our proposed method is evaluated on various tasks such as image classification, object detection and semantic segmentation and achieves a significant performance improvement in all tasks.




Hint learning没有很好地利用特征蒸馏,更多的提点仍来自于输出蒸馏

After FitNets, variant methods of feature distillation have been proposed as follows. The methods in [30, 28] transform the feature into a representation having a reduced dimension and transfer it to the student. In spite of the reduced dimension, it has been reported that the abstracted feature representation does lead to an improved performance. Recent methods (FT [13], AB [7]) have been proposed to increase the amount of transferred information in distillation. FT [13] encodes the feature into a ‘factor’ using an auto-encoder to alleviate the leakage of information. AB [7] focuses on activation of a network with only the sign of features being transferred. Both methods show a better distillation performance by increasing the amount of transferred information. However, FT [13] and AB [7] deform feature values of the teacher, which leaves a further room for the performance to be improved.


在FitNets之后,提出了一些变种的特征蒸馏方法,具体如下。在文献[30, 28]中提出的方法将特征转换为具有降维的表示,并将其传输给学生网络。尽管维度降低了,但据报道,提取的特征表示确实导致了性能的提升。最近提出的方法(FT [13]、AB [7])旨在增加蒸馏中传输的信息量。FT [13]使用自编码器将特征编码为‘因子’,以减轻信息泄漏。AB [7]专注于仅传输特征的符号的网络激活。这两种方法通过增加传输的信息量来展现出更好的蒸馏性能。然而,FT [13]和AB [7]会改变教师网络的特征值,这进一步为性能的提升留下了空间

In this paper, we further improve the performance of feature distillation by proposing a new feature distillation loss which is designed via investigation of various design aspects: teacher transform, student transform, distillation feature position and distance function. Our method aims to transfer two factors from features. The first target is the magnitude of feature response after ReLU, since it carries most of the feature information. The second is the activation status of each neuron. Recent studies [20, 7] have shown that the activation of neurons strongly represents the expressiveness of a network, and it should be considered in distillation. To this purpose, we propose a margin ReLU function, change the distillation feature position to the front of ReLU, and use a partial L2 distance function to skip the distillation of unnecessary information. The proposed loss significantly improves performance of feature distillation. In our experiments, we have evaluated our proposed method in various domains including classification (CIFAR [15], ImageNet [23]), object detection (PASCAL VOC [2]) and semantic segmentation (PASCAL VOC). As shown in Fig. 1, in our experiments, the proposed method shows a performance superior to the existing state-of-the-art methods and even the teacher model.


本文通过对各种设计方面的调查,包括教师变换、学生变换、蒸馏特征位置和距离函数,进一步改进了特征蒸馏的性能,提出了一种新的特征蒸馏损失。我们的方法旨在从特征中传输两个因素。第一个目标是经过ReLU后的特征响应的幅度,因为它携带了大部分的特征信息。第二个是每个神经元的激活状态。最近的研究[20, 7]表明,神经元的激活强烈地代表了网络的表达能力,并且在蒸馏中应予以考虑。为此,我们提出了一个边缘ReLU函数,将蒸馏特征位置改变到ReLU的前面,并使用一个部分L2距离函数来跳过不必要信息的蒸馏。提出的损失显著提高了特征蒸馏的性能。在我们的实验中,我们评估了我们的方法在各个领域的性能,包括分类(CIFAR [15],ImageNet [23]),目标检测(PASCAL VOC [2])和语义分割(PASCAL VOC)。如图1所示,在我们的实验中,所提出的方法显示出比现有的最先进方法甚至教师模型更优异的性能




提出了一个margin ReLU激活函数,并且利用一个局部的L2正则化进行距离度量,以此来跳过对非必要信息的蒸馏


Teacher transform

Hint learning等方法对教师模型的特征进行变换后再与学生guide,损失了教师信息


Student transform


Distillation feature position



Distance function



Distillation position



 Loss function


m是一个小于零的边距值。我们将这个函数命名为margin ReLU


 期望值可以在训练过程中直接计算,也可以使用之前批归一化层的参数计算。在我们提出的方法中,将余量ReLU σmC(·)用作教师变换Tt,生成学生网络的目标特征值。对于学生变换,使用由1 × 1卷积层和批归一化层组成的回归量



Batch normalization






We propose a new knowledge distillation method along with several investigations about various aspects of the existing feature distillation methods. We have discovered the effectiveness of pre-ReLU location and proposed a new loss function to improve the performance of feature distillation.

The new loss function consists of a teacher transform (margin ReLU) and a new distance function (partial L2) and enables an effective feature distillation at pre-ReLU location. We have also investigated about the mode of batch normalization in teacher network and achieved additional performance improvements. Through experiments, we examined the performance of the proposed method using various networks in various tasks, and proved that the proposed method substantially outperforms the state-of-the-arts of feature distillation.





  1. 论文阅读笔记】清单

    2024-04-29 08:44:04       51 阅读


  1. TCP协议是安全的吗?

    2024-04-29 08:44:04       18 阅读
  2. 阿里云服务器执行yum,一直下载docker-ce-stable失败

    2024-04-29 08:44:04       19 阅读
  3. 【Python教程】压缩PDF文件大小

    2024-04-29 08:44:04       18 阅读
  4. 通过文章id递归查询所有评论(xml)

    2024-04-29 08:44:04       20 阅读


  1. Doxygen

    2024-04-29 08:44:04       11 阅读
  2. vue使用axios解决跨域get和post请求

    2024-04-29 08:44:04       10 阅读
  3. Python:匿名函数lambda用法

    2024-04-29 08:44:04       14 阅读
  4. react 笔记

    2024-04-29 08:44:04       15 阅读
  5. Anaconda-用conda创建python虚拟环境常用命令

    2024-04-29 08:44:04       11 阅读
  6. mxnet.gluon.rnn及mxnet.symbol实现LSTM教程

    2024-04-29 08:44:04       13 阅读
  7. 阿里云RocketMQ消费MQTT消息

    2024-04-29 08:44:04       10 阅读
  8. LeetCode 第395场周赛个人题解

    2024-04-29 08:44:04       12 阅读
  9. iOS pod库包含MRC类

    2024-04-29 08:44:04       9 阅读
  10. K8s: 应用项目部署运维环境搭建

    2024-04-29 08:44:04       9 阅读
  11. Python常见的第三方库[详细解析]

    2024-04-29 08:44:04       8 阅读
  12. 带你深入了解空三加密

    2024-04-29 08:44:04       11 阅读
  13. SAP事务码列表 1、CODE_SCANNER: 代码搜索工具

    2024-04-29 08:44:04       10 阅读
  14. 不定期会议对团队开发的影响(项目管理篇)

    2024-04-29 08:44:04       12 阅读
  15. 在centos上通过yum安装指定版本的软件

    2024-04-29 08:44:04       9 阅读