【Machine Learning】Suitable Learning Rate in Machine Learning

一、The cases of different learning rates:

        In the gradient descent algorithm model:

w = w - \alpha \frac{ \partial J(w,b) }{ \partial w }

        \alpha is the learning rate of the demand, how to determine the learning rate, and what impact does it have if it is too large or too small? We will analyze it through the following graph:

        We can use the same method as before to understand this equation, so that b in J (w, b) is 0, and then we can create a two-dimensional coordinate graph:

        So let's first observe the case of a smaller learning rate (starting from F):

        In this case, there is a high probability that the minimum point can be found, which means that it can eventually converge.

        Then there are situations with high learning rates:

        We can find that when the learning rate is high but within a certain limit, convergence can also be achieved. The reason for this can be started from the formula. Whenever a point drops to a point with a smaller slope, its learning rate remains unchanged, but the slope decreases, and it will eventually continue to decline until convergence. However, will this situation continue? We can take a look at the following situation:

        The difference between this and the above is that when descending, it may just skip the optimal point, which may result in the convergence value not being optimal.

        Finally, there is the case of divergence:

        So the situation is roughly like these:

        In the picture, loss is an indicator that measures the difference between the predicted results of the model and the actual labels, and epoch is a complete training process in the gradient descent algorithm, which includes multiple iterations of parameter updates.

二、How to choose the Suitable Learning Rate:

        In algorithm design, we should adjust the learning rate in real time and determine the size of the adjustment by observing the fitted model. After each iteration, use the estimated model parameters to view the value of the error function. If the error rate decreases compared to the previous iteration, the learning rate can be increased. If the error rate increases compared to the previous iteration, the value of the previous iteration should be reset and the learning rate reduced to 50% of the previous iteration. Therefore, this is a method of adaptive learning rate adjustment. There are simple and direct methods for dynamically changing learning rates in deep learning frameworks such as Caffe and TensorFlow.

        The commonly used learning rates are 0.00001, 0.0001, 0.001, 0.003, 0.01, 0.03, 0.1, 0.3, 1, 3, 10

相关推荐

最近更新

  1. docker php8.1+nginx base 镜像 dockerfile 配置

    2024-03-17 06:04:01       94 阅读
  2. Could not load dynamic library ‘cudart64_100.dll‘

    2024-03-17 06:04:01       100 阅读
  3. 在Django里面运行非项目文件

    2024-03-17 06:04:01       82 阅读
  4. Python语言-面向对象

    2024-03-17 06:04:01       91 阅读

热门阅读

  1. 第一章:网络协议的奥秘

    2024-03-17 06:04:01       40 阅读
  2. 网格bfs,LeetCode 2684. 矩阵中移动的最大次数

    2024-03-17 06:04:01       44 阅读
  3. TensorFlow 的基本概念和使用场景

    2024-03-17 06:04:01       46 阅读
  4. TensorFlow的介绍和简单案例

    2024-03-17 06:04:01       40 阅读
  5. 网页中 link 和@import介绍

    2024-03-17 06:04:01       49 阅读
  6. 力扣 347前k个高频元素

    2024-03-17 06:04:01       38 阅读
  7. 数据结构 第5章 树与二叉树(一轮习题总结)

    2024-03-17 06:04:01       43 阅读
  8. 【List、Set、数据结构、Collections】-Collections

    2024-03-17 06:04:01       35 阅读
  9. 数据结构的概念大合集05(串)

    2024-03-17 06:04:01       37 阅读
  10. 这是二叉搜索树吗?

    2024-03-17 06:04:01       43 阅读
  11. 【MySql】MySql常用语句都有哪些

    2024-03-17 06:04:01       33 阅读
  12. 剑指offer面试题36 数组中的逆序对

    2024-03-17 06:04:01       42 阅读
  13. 【vue2源码】模版编译

    2024-03-17 06:04:01       34 阅读
  14. ChatGPT团队:介绍OpenAI团队生产力提升工具

    2024-03-17 06:04:01       34 阅读