Computer Vision-CNN

2024-04-24 11:26:01
开发
13

CNN(Convolutional Neural Network)

Import a question：classification

given a feature representing for images, how do we learn a model for distinguishing features from different classes?

The machine learning framework

1:prediction function to get desired output:
f(🍎）=apple
f(🍅）=tomato
f(🐮）=cow

2:The framework
请添加图片描述
here, there are two activities:

Training:knowing training set {(x1,y1)……(xn,yn)} estimate the prediction function f
Testing:knowing f,to test x and output value y=f(x)

Neural Networks(Linear)

Perceptron(感知机）
Linear classifier-vector of weights w and a ‘bias b

This is convolution!

An example of binary classifying an image

Each pixel of the image would be an input, so, for a 28x28 image, we vectorize(矢量化)，x=1x784

矢量化是一种将图像、图形或其他类型的数据转换为矢量格式的过程。在矢量格式中，图像和图形被表示为数学公式，而不是像素或其他离散数据点的集合。这种表示方式具有许多优点，包括：
可缩放性：矢量图形可以无限放大或缩小，而不会失去清晰度或产生锯齿状边缘。
编辑性：矢量图形可以轻松地编辑和修改，例如更改颜色、形状、大小等，而不会影响图像的质量。
交互性：矢量图形可以与其他应用程序进行交互，例如在网站上使用矢量图形可以使页面加载更快，并且可以通过CSS样式表轻松地更改图形属性。
打印质量：矢量图形具有更高的打印质量，因为它们不会失去清晰度或产生锯齿状边缘。
总之，矢量化可以提高图像和图形的质量，并使其更易于编辑、缩放和使用。

w is a vector of weights for each pixel: 784x1
b is a scalar(标量） bias per perceptron
result=xw+b ->(1x784)(784x1)+b->(1x1)+b
[Notice: the result of multiplying xw is a scalar（dot product)]

Multuclass(add more perceptrons)

请添加图片描述

x same as above example ->x=1x784
W is a matrix of weights for each pixel/each perceptron
w=784x10(assume 10-class classification)
b is a bias per perceptron(vector of biases)->b=1x10
result=xW+b=(1x784)x(784x10)+b=(1x10)+(1x10)=output vector

Bias convenience

create a ‘fake’ feature with value 1 to represent the bias
Add an extra weight that can vary

请添加图片描述
Then: the composition :

Outputs from one perceptron are fed into inputs of another perceptron

It’s all just matrix multiplication!

Two problems

1:with all linear functions, the composition of functions is really just a single function(not complex function)

2:Linear classifiers:small change in input can cause large change in binary output=problem for composition of functions.

The thing we want:

Neural Network(Non-Linearities)

MLP(Multi-layer perceptron)

with enough parameters, it can approximate any function
images as input to neural networks(spatial correlation is local+waste of resource and we have not enough training samples)

so we import an activity: Sparse interactions

composition of layers will expand local to global

Note:after such operation,the parameterization is good when input image is registered

Convolution Layer

请添加图片描述

pooling Layer:Receptive Field Size

请添加图片描述

Pooling is similar to downsampling

In convolution neural network, we always adopt pooling layer after a convolution layer operation.(Often using Max pooling not average pooling)
There are many kind of pooling layer(max/average)

Local contrast Normalization

请添加图片描述

原文地址:https://blog.csdn.net/yanlingyun0210/article/details/137889333 本文来自互联网用户投稿，该文观点仅代表作者本人，不代表本站立场。本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如若转载，请注明出处：https://www.suanlizi.com/kf/1782974246822219776.html 如若内容造成侵权/违法违规/事实不符，请联系《酸梨子》网邮箱：1419361763@qq.com进行投诉反馈，一经查实，立即删除！

阅读全部