深度学习入门(6) - 3DV 三维视觉

3DV

Two focus : predicting 3d shapes from image and processing 3d input data

Representations of 3D shape

Depth map

gives distance from the camera to the object in the world at that pixel

RGB image + Depth image = RGB-D Image (2.5D)

We can use Fully Convolutional network to predict the depth

problem : Scale / Depth Ambiguity

-> Use Scale invariant loss

Surface Normals

give a vector giving normal vector to the object in the world for that pixel

We can use Fully Convolutional network to predict Surface Normals

loss: x y ∣ x ∣ ∣ y ∣ \frac{x y}{|x||y|} x∣∣yxy

Also can’t represent the occluded objects

Voxel Grid

Represent a shape with a V × V × V V \times V \times V V×V×V grid of occupancies (just like minecraft 😃

Problems: Need high spatial resolution to capture fine structures, scaling to high resolutions in not trival

Use 3D convolution to do classification

We can have the following architecture :

image -> 2D CNN -> fully connected layer -> 3D CNN -> Voxels

but it’s expensive

we can use “Voxel Tubes”:

请添加图片描述

We have sacrifice the z-dim spatial information, and the memory usage of Voxel is not affordable.

Solution : Oct-Trees

use voxel grids with heterogenous resolution

Nested Shape Layers

Predict shape as a composition of positive and negative grids

Implicit Surface

learn a function o : R 3 → { 0 , 1 } o: \R^3 \rightarrow \{0,1\} o:R3{0,1}

to classify arbitrary 3D points as inside / outside the shape

same idea: signed distance function gives Euclidean distance to the surface of the shape

Point Cloud

represent shape as a set of P points in 3D space

nice property: can represent fine structure without huge number of points

bad property: doesn’t explicitly represent the surface of the shape

PointNet

Input pointcloud --MLP on each points-> point features --max pooling -> pooled vector --FC-> class score

We want to process pointclouds as sets : order should not matter

Generating Pointcloud Outputs

Loss function (new):

Chamfer distance: sum of L2 distance to each point’s nearest neighbor in the other set

Mesh

Triangle Mesh

represent a 3D shape as a set of triangles

Vertices: Set of V points in 3D shape

Faces: Set of triangles over the vertices

We can attach data on verts and interpolate over the whole surface

However, nontrivial to process with neural nets

Pixel2Mesh

key ideas:

  1. iterative mesh refinement

​ Start from initial ellipsoid mesh

  1. Graph Convolution

​ input : Graph with a feature vector attached to every vertex of the graph

​ output : a new feature vector to every vertex

f i ′ = W 0 f i + ∑ j ∈ N ( i ) W 1 f j f_i' = W_0f_i + \sum_{j \in N(i)} W_1f_j fi=W0fi+jN(i)W1fj

  1. Vertex-Aligned Features

​ For each vertex of the mesh : use camera information to project onto image plane

​ use bilinear interpolation to sample a CNN feature

  1. Loss function

​ Invert meshes to pointclouds then compute loss -> avoid different representation of same graphs causing different loss

Metrics

Chamfer distance on pointclouds

​ sensitive to outliers

F1 score on pointclouds

Precision @t = fraction of predicted points within t of some groud-truth point

Recall @t = fraction of groud-truth points within t of some predicted ponit

F 1 @ t = 2 P r e c i s i o n @ t ∗ R e c a l l @ t P r e c i s i o n @ t + R e c a l l @ t F1@t = 2\frac{Precision @t * Recall @t}{Precision @t + Recall @t} F1@t=2Precision@t+Recall@tPrecision@tRecall@t

Cameras: Canonical vs View Coordinates

Problem : Canonical views overfits more often

Dataset

ShapeNet: synthetic, no context
Pix3D: Real image but small

Mesh R-CNN

Mesh deformation gives good results but the topology is fixed by the initial mesh

Approach: Use voxel predictions to create initial mesh prediction

help predict things with holes

add L2 norm as well

Amodal completion: predict occluded parts of the objects

相关推荐

  1. QT6.3学习技巧,快速入门

    2024-04-26 10:26:03       41 阅读
  2. QT6.3学习技巧,快速入门

    2024-04-26 10:26:03       5 阅读

最近更新

  1. TCP协议是安全的吗?

    2024-04-26 10:26:03       19 阅读
  2. 阿里云服务器执行yum,一直下载docker-ce-stable失败

    2024-04-26 10:26:03       19 阅读
  3. 【Python教程】压缩PDF文件大小

    2024-04-26 10:26:03       20 阅读
  4. 通过文章id递归查询所有评论(xml)

    2024-04-26 10:26:03       20 阅读

热门阅读

  1. 【C语言】求一个数的所有质数因子并排序

    2024-04-26 10:26:03       13 阅读
  2. 【Nginx】Nginx 最新社区稳定版-1.26.0-发布

    2024-04-26 10:26:03       12 阅读
  3. 离开A页面时,取消A页面的axios接口数据请求

    2024-04-26 10:26:03       14 阅读
  4. Python和C++音调音符规划和算法

    2024-04-26 10:26:03       32 阅读
  5. 力扣795.区间子数组个数 | 树状数组解法

    2024-04-26 10:26:03       12 阅读
  6. 磨损对输送带安全的影响

    2024-04-26 10:26:03       33 阅读
  7. C#中的LINQ(Language-Integrated Query)

    2024-04-26 10:26:03       14 阅读
  8. 二叉树层次遍历

    2024-04-26 10:26:03       19 阅读
  9. 分布式与微服务区别?

    2024-04-26 10:26:03       11 阅读
  10. npm cnpm pnpm yarn 有什么区别? 哪个更好用呢?

    2024-04-26 10:26:03       15 阅读
  11. sklearn混淆矩阵的计算和seaborn可视化

    2024-04-26 10:26:03       19 阅读
  12. springboot+Vue实现分页

    2024-04-26 10:26:03       13 阅读