《昇思25天学习打卡营第18天|基于MobileNetv2的垃圾分类》

MobileNetV2是一种轻量级的深度神经网络,设计用于移动和嵌入式设备。它的核心思想是通过深度可分离卷积(Depthwise Separable Convolutions)和倒残差结构(Inverted Residuals)来减少计算复杂度和模型参数量。其主要特点包括:

  • 深度可分离卷积:将标准卷积分解为深度卷积(Depthwise Convolution)和逐点卷积(Pointwise Convolution),大大减少了计算量。
  • 倒残差结构:通过先扩展特征维度再压缩的方式,使网络更容易训练和优化,同时保持较高的性能。
  • 线性瓶颈:在每个倒残差块的最后使用线性激活函数,以避免信息丢失。

 MobileNetV2的具体架构

MobileNetV2由多个倒残差块组成,每个块包含:

  • 逐点卷积:使用1x1卷积核扩展特征维度。
  • 深度卷积:使用3x3卷积核对每个通道独立进行卷积操作。
  • 逐点卷积:再次使用1x1卷积核压缩特征维度。
  • 跳跃连接:将输入和输出相加(在特征维度相同的情况下),形成残差连接。

    这些块叠加在一起,形成了一个深度神经网络,可以有效地提取图像特征。

垃圾分类系统的工作流程

定义了损失函数后,可以得到损失函数关于权重的梯度。梯度用于指示优化器优化权重的方向,以提高模型性能。

在训练MobileNetV2之前对MobileNetV2Backbone层的参数进行了固定,使其在训练过程中对该模块的权重参数不进行更新;只对MobileNetV2Head模块的参数进行更新。

MindSpore支持的损失函数有SoftmaxCrossEntropyWithLogits、L1Loss、MSELoss等。这里使用SoftmaxCrossEntropyWithLogits损失函数。

训练测试过程中会打印loss值,loss值会波动,但总体来说loss值会逐步减小,精度逐步提高。每个人运行的loss值有一定随机性,不一定完全相同。

每打印一个epoch后模型都会在测试集上的计算测试精度,从打印的精度值分析MobileNetV2模型的预测能力在不断提升。

  1. 数据预处理

    • 加载图像数据,并将其调整为统一的大小(如224x224)。
    • 归一化图像像素值,使其在0到1之间。
    • 将标签转换为独热编码(one-hot encoding)。


      利用ImageFolderDataset方法读取垃圾分类数据集,并整体对数据集进行处理。

      读取数据集时指定训练集和测试集,首先对整个数据集进行归一化,修改图像频道等预处理操作。然后对训练集的数据依次进行RandomCropDecodeResize、RandomHorizontalFlip、RandomColorAdjust、shuffle操作,以增加训练数据的丰富度;对测试集进行Decode、Resize、CenterCrop等预处理操作;最后返回处理后的数据集

  2. 模型构建

    • 使用预训练的MobileNetV2模型作为特征提取器,去掉顶层的分类部分。
    • 添加全局平均池化层(Global Average Pooling Layer),将特征图转换为固定长度的特征向量。
    • 添加全连接层和输出层,根据具体的分类任务输出相应数量的类别。
      __all__ = ['MobileNetV2', 'MobileNetV2Backbone', 'MobileNetV2Head', 'mobilenet_v2']
      
      def _make_divisible(v, divisor, min_value=None):
          if min_value is None:
              min_value = divisor
          new_v = max(min_value, int(v + divisor / 2) // divisor * divisor)
          if new_v < 0.9 * v:
              new_v += divisor
          return new_v
      
      class GlobalAvgPooling(nn.Cell):
          """
          Global avg pooling definition.
      
          Args:
      
          Returns:
              Tensor, output tensor.
      
          Examples:
              >>> GlobalAvgPooling()
          """
      
          def __init__(self):
              super(GlobalAvgPooling, self).__init__()
      
          def construct(self, x):
              x = P.mean(x, (2, 3))
              return x
      
      class ConvBNReLU(nn.Cell):
          """
          Convolution/Depthwise fused with Batchnorm and ReLU block definition.
      
          Args:
              in_planes (int): Input channel.
              out_planes (int): Output channel.
              kernel_size (int): Input kernel size.
              stride (int): Stride size for the first convolutional layer. Default: 1.
              groups (int): channel group. Convolution is 1 while Depthiwse is input channel. Default: 1.
      
          Returns:
              Tensor, output tensor.
      
          Examples:
              >>> ConvBNReLU(16, 256, kernel_size=1, stride=1, groups=1)
          """
      
          def __init__(self, in_planes, out_planes, kernel_size=3, stride=1, groups=1):
              super(ConvBNReLU, self).__init__()
              padding = (kernel_size - 1) // 2
              in_channels = in_planes
              out_channels = out_planes
              if groups == 1:
                  conv = nn.Conv2d(in_channels, out_channels, kernel_size, stride, pad_mode='pad', padding=padding)
              else:
                  out_channels = in_planes
                  conv = nn.Conv2d(in_channels, out_channels, kernel_size, stride, pad_mode='pad',
                                   padding=padding, group=in_channels)
      
              layers = [conv, nn.BatchNorm2d(out_planes), nn.ReLU6()]
              self.features = nn.SequentialCell(layers)
      
          def construct(self, x):
              output = self.features(x)
              return output
      
      class InvertedResidual(nn.Cell):
          """
          Mobilenetv2 residual block definition.
      
          Args:
              inp (int): Input channel.
              oup (int): Output channel.
              stride (int): Stride size for the first convolutional layer. Default: 1.
              expand_ratio (int): expand ration of input channel
      
          Returns:
              Tensor, output tensor.
      
          Examples:
              >>> ResidualBlock(3, 256, 1, 1)
          """
      
          def __init__(self, inp, oup, stride, expand_ratio):
              super(InvertedResidual, self).__init__()
              assert stride in [1, 2]
      
              hidden_dim = int(round(inp * expand_ratio))
              self.use_res_connect = stride == 1 and inp == oup
      
              layers = []
              if expand_ratio != 1:
                  layers.append(ConvBNReLU(inp, hidden_dim, kernel_size=1))
              layers.extend([
                  ConvBNReLU(hidden_dim, hidden_dim,
                             stride=stride, groups=hidden_dim),
                  nn.Conv2d(hidden_dim, oup, kernel_size=1,
                            stride=1, has_bias=False),
                  nn.BatchNorm2d(oup),
              ])
              self.conv = nn.SequentialCell(layers)
              self.cast = P.Cast()
      
          def construct(self, x):
              identity = x
              x = self.conv(x)
              if self.use_res_connect:
                  return P.add(identity, x)
              return x
      
      class MobileNetV2Backbone(nn.Cell):
          """
          MobileNetV2 architecture.
      
          Args:
              class_num (int): number of classes.
              width_mult (int): Channels multiplier for round to 8/16 and others. Default is 1.
              has_dropout (bool): Is dropout used. Default is false
              inverted_residual_setting (list): Inverted residual settings. Default is None
              round_nearest (list): Channel round to . Default is 8
          Returns:
              Tensor, output tensor.
      
          Examples:
              >>> MobileNetV2(num_classes=1000)
          """
      
          def __init__(self, width_mult=1., inverted_residual_setting=None, round_nearest=8,
                       input_channel=32, last_channel=1280):
              super(MobileNetV2Backbone, self).__init__()
              block = InvertedResidual
              # setting of inverted residual blocks
              self.cfgs = inverted_residual_setting
              if inverted_residual_setting is None:
                  self.cfgs = [
                      # t, c, n, s
                      [1, 16, 1, 1],
                      [6, 24, 2, 2],
                      [6, 32, 3, 2],
                      [6, 64, 4, 2],
                      [6, 96, 3, 1],
                      [6, 160, 3, 2],
                      [6, 320, 1, 1],
                  ]
      
              # building first layer
              input_channel = _make_divisible(input_channel * width_mult, round_nearest)
              self.out_channels = _make_divisible(last_channel * max(1.0, width_mult), round_nearest)
              features = [ConvBNReLU(3, input_channel, stride=2)]
              # building inverted residual blocks
              for t, c, n, s in self.cfgs:
                  output_channel = _make_divisible(c * width_mult, round_nearest)
                  for i in range(n):
                      stride = s if i == 0 else 1
                      features.append(block(input_channel, output_channel, stride, expand_ratio=t))
                      input_channel = output_channel
              features.append(ConvBNReLU(input_channel, self.out_channels, kernel_size=1))
              self.features = nn.SequentialCell(features)
              self._initialize_weights()
      
          def construct(self, x):
              x = self.features(x)
              return x
      
          def _initialize_weights(self):
              """
              Initialize weights.
      
              Args:
      
              Returns:
                  None.
      
              Examples:
                  >>> _initialize_weights()
              """
              self.init_parameters_data()
              for _, m in self.cells_and_names():
                  if isinstance(m, nn.Conv2d):
                      n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
                      m.weight.set_data(Tensor(np.random.normal(0, np.sqrt(2. / n),
                                                                m.weight.data.shape).astype("float32")))
                      if m.bias is not None:
                          m.bias.set_data(
                              Tensor(np.zeros(m.bias.data.shape, dtype="float32")))
                  elif isinstance(m, nn.BatchNorm2d):
                      m.gamma.set_data(
                          Tensor(np.ones(m.gamma.data.shape, dtype="float32")))
                      m.beta.set_data(
                          Tensor(np.zeros(m.beta.data.shape, dtype="float32")))
      
          @property
          def get_features(self):
              return self.features
      
      class MobileNetV2Head(nn.Cell):
          """
          MobileNetV2 architecture.
      
          Args:
              class_num (int): Number of classes. Default is 1000.
              has_dropout (bool): Is dropout used. Default is false
          Returns:
              Tensor, output tensor.
      
          Examples:
              >>> MobileNetV2(num_classes=1000)
          """
      
          def __init__(self, input_channel=1280, num_classes=1000, has_dropout=False, activation="None"):
              super(MobileNetV2Head, self).__init__()
              # mobilenet head
              head = ([GlobalAvgPooling(), nn.Dense(input_channel, num_classes, has_bias=True)] if not has_dropout else
                      [GlobalAvgPooling(), nn.Dropout(0.2), nn.Dense(input_channel, num_classes, has_bias=True)])
              self.head = nn.SequentialCell(head)
              self.need_activation = True
              if activation == "Sigmoid":
                  self.activation = nn.Sigmoid()
              elif activation == "Softmax":
                  self.activation = nn.Softmax()
              else:
                  self.need_activation = False
              self._initialize_weights()
      
          def construct(self, x):
              x = self.head(x)
              if self.need_activation:
                  x = self.activation(x)
              return x
      
          def _initialize_weights(self):
              """
              Initialize weights.
      
              Args:
      
              Returns:
                  None.
      
              Examples:
                  >>> _initialize_weights()
              """
              self.init_parameters_data()
              for _, m in self.cells_and_names():
                  if isinstance(m, nn.Dense):
                      m.weight.set_data(Tensor(np.random.normal(
                          0, 0.01, m.weight.data.shape).astype("float32")))
                      if m.bias is not None:
                          m.bias.set_data(
                              Tensor(np.zeros(m.bias.data.shape, dtype="float32")))
          @property
          def get_head(self):
              return self.head
      
      class MobileNetV2(nn.Cell):
          """
          MobileNetV2 architecture.
      
          Args:
              class_num (int): number of classes.
              width_mult (int): Channels multiplier for round to 8/16 and others. Default is 1.
              has_dropout (bool): Is dropout used. Default is false
              inverted_residual_setting (list): Inverted residual settings. Default is None
              round_nearest (list): Channel round to . Default is 8
          Returns:
              Tensor, output tensor.
      
          Examples:
              >>> MobileNetV2(backbone, head)
          """
      
          def __init__(self, num_classes=1000, width_mult=1., has_dropout=False, inverted_residual_setting=None, \
              round_nearest=8, input_channel=32, last_channel=1280):
              super(MobileNetV2, self).__init__()
              self.backbone = MobileNetV2Backbone(width_mult=width_mult, \
                  inverted_residual_setting=inverted_residual_setting, \
                  round_nearest=round_nearest, input_channel=input_channel, last_channel=last_channel).get_features
              self.head = MobileNetV2Head(input_channel=self.backbone.out_channel, num_classes=num_classes, \
                  has_dropout=has_dropout).get_head
      
          def construct(self, x):
              x = self.backbone(x)
              x = self.head(x)
              return x
      
      class MobileNetV2Combine(nn.Cell):
          """
          MobileNetV2Combine architecture.
      
          Args:
              backbone (Cell): the features extract layers.
              head (Cell):  the fully connected layers.
          Returns:
              Tensor, output tensor.
      
          Examples:
              >>> MobileNetV2(num_classes=1000)
          """
      
          def __init__(self, backbone, head):
              super(MobileNetV2Combine, self).__init__(auto_prefix=False)
              self.backbone = backbone
              self.head = head
      
          def construct(self, x):
              x = self.backbone(x)
              x = self.head(x)
              return x
      
      def mobilenet_v2(backbone, head):
          return MobileNetV2Combine(backbone, head)
  3. 模型训练

    • 冻结MobileNetV2的卷积层,只训练新增的全连接层。
    • 使用交叉熵损失函数和Adam优化器进行模型训练。
    • 在验证集上监控模型表现,并使用早停(Early Stopping)和模型检查点(Model Checkpoint)保存最优模型。
      def cosine_decay(total_steps, lr_init=0.0, lr_end=0.0, lr_max=0.1, warmup_steps=0):
          """
          Applies cosine decay to generate learning rate array.
      
          Args:
             total_steps(int): all steps in training.
             lr_init(float): init learning rate.
             lr_end(float): end learning rate
             lr_max(float): max learning rate.
             warmup_steps(int): all steps in warmup epochs.
      
          Returns:
             list, learning rate array.
          """
          lr_init, lr_end, lr_max = float(lr_init), float(lr_end), float(lr_max)
          decay_steps = total_steps - warmup_steps
          lr_all_steps = []
          inc_per_step = (lr_max - lr_init) / warmup_steps if warmup_steps else 0
          for i in range(total_steps):
              if i < warmup_steps:
                  lr = lr_init + inc_per_step * (i + 1)
              else:
                  cosine_decay = 0.5 * (1 + math.cos(math.pi * (i - warmup_steps) / decay_steps))
                  lr = (lr_max - lr_end) * cosine_decay + lr_end
              lr_all_steps.append(lr)
      
          return lr_all_steps
  4. 模型训练与测试

    在进行正式的训练之前,定义训练函数,读取数据并对模型进行实例化,定义优化器和损失函数。

    首先简单介绍损失函数及优化器的概念:

    损失函数:又叫目标函数,用于衡量预测值与实际值差异的程度。深度学习通过不停地迭代来缩小损失函数的值。定义一个好的损失函数,可以有效提高模型的性能。

    优化器:用于最小化损失函数,从而在训练过程中改进模型。

  5. 模型推理
    加载模型Checkpoint进行推理,使用load_checkpoint接口加载数据时,需要把数据传入给原始网络,而不能传递给带有优化器和损失函数的训练网络。

    CKPT="save_mobilenetV2_model.ckpt"
    
    def image_process(image):
        """Precess one image per time.
        
        Args:
            image: shape (H, W, C)
        """
        mean=[0.485*255, 0.456*255, 0.406*255]
        std=[0.229*255, 0.224*255, 0.225*255]
        image = (np.array(image) - mean) / std
        image = image.transpose((2,0,1))
        img_tensor = Tensor(np.array([image], np.float32))
        return img_tensor
    
    def infer_one(network, image_path):
        image = Image.open(image_path).resize((config.image_height, config.image_width))
        logits = network(image_process(image))
        pred = np.argmax(logits.asnumpy(), axis=1)[0]
        print(image_path, class_en[pred])
    
    def infer():
        backbone = MobileNetV2Backbone(last_channel=config.backbone_out_channels)
        head = MobileNetV2Head(input_channel=backbone.out_channels, num_classes=config.num_classes)
        network = mobilenet_v2(backbone, head)
        load_checkpoint(CKPT, network)
        for i in range(91, 100):
            infer_one(network, f'data_en/test/Cardboard/000{i}.jpg')
    infer()

最近更新

  1. docker php8.1+nginx base 镜像 dockerfile 配置

    2024-07-14 10:20:05       67 阅读
  2. Could not load dynamic library ‘cudart64_100.dll‘

    2024-07-14 10:20:05       72 阅读
  3. 在Django里面运行非项目文件

    2024-07-14 10:20:05       58 阅读
  4. Python语言-面向对象

    2024-07-14 10:20:05       69 阅读

热门阅读

  1. Perl 语言开发(十五):调试和测试

    2024-07-14 10:20:05       20 阅读
  2. 平衡三进制分布式计算

    2024-07-14 10:20:05       25 阅读
  3. RESTful API的设计与实现

    2024-07-14 10:20:05       24 阅读
  4. 39.全连接层问题

    2024-07-14 10:20:05       21 阅读
  5. 力扣题解(分割回文串II)

    2024-07-14 10:20:05       22 阅读
  6. Linux C++ 054-设计模式之外观模式

    2024-07-14 10:20:05       26 阅读
  7. 大白话【卷积神经网络】工作原理

    2024-07-14 10:20:05       25 阅读
  8. [NOIP2005 普及组] 采药

    2024-07-14 10:20:05       24 阅读
  9. 【Git使用】管理代码

    2024-07-14 10:20:05       21 阅读
  10. 分区和分桶的区别

    2024-07-14 10:20:05       23 阅读