谱聚类的原理

谱聚类

谱聚类算法流程:
i n p u t :     X = { x 1 , x 2 , . . . , x n }    o u t p u t :     C = { c 1 , c 2 , . . . c k 2 }     ( 1 )根据输入的相似矩阵生成方式构建样本的相似矩阵 S ( 2 )根据相似矩阵 S 构建邻接矩阵 W ,构建度矩阵 D         ( 3 )计算出拉普拉斯矩阵 L                                             ( 4 )构建标准化后的拉普拉斯矩阵 D − 1 2 L D − 1 2                 ( 5 )计算最小的 D − 1 2 L D − 1 2 k 1 个特征值所各自对应的特征 向量 f ( 6 )将各自对应的特征向量 f 组成的矩阵进行标准化,       最终组成 n × k 1 维的特征矩阵 F ( 7 )对 F 中的每一行作为一个 k 1 维的样本,共 n 个样本, 用输入的聚类方法进行聚类,聚类维数为 k 2 。 ( 8 )得到簇划分 ( c 1 , . . . , c k 2 )                                          input:\ \ \ X=\{x_1,x_2,...,x_n \}\ \ \\ output:\ \ \ C=\{c_1,c_2,...c_{k2} \}\ \ \ \\ \\ (1)根据输入的相似矩阵生成方式构建样本的相似矩阵S\\ \\ (2)根据相似矩阵S构建邻接矩阵W,构建度矩阵D\ \ \ \ \ \ \ \\ \\ (3)计算出拉普拉斯矩阵L\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \\ \\ (4)构建标准化后的拉普拉斯矩阵D^{-\frac12}LD^{-\frac12}\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \\ \\ (5)计算最小的D^{-\frac12}LD^{-\frac12}k_1个特征值所各自对应的特征\\向量f \\ \\ (6)将各自对应的特征向量f组成的矩阵进行标准化,\ \ \ \ \ \ \\最终组成n×k_1维的特征矩阵F\\ \\ (7)对F中的每一行作为一个k1维的样本,共n个样本,\\用输入的聚类方法进行聚类,聚类维数为k2。\\ \\ (8)得到簇划分(c_1,...,c_{k2})\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ input:   X={ x1,x2,...,xn}  output:   C={ c1,c2,...ck2}   1)根据输入的相似矩阵生成方式构建样本的相似矩阵S2)根据相似矩阵S构建邻接矩阵W,构建度矩阵D       3)计算出拉普拉斯矩阵L                                           4)构建标准化后的拉普拉斯矩阵D21LD21               5)计算最小的D21LD21k1个特征值所各自对应的特征向量f6)将各自对应的特征向量f组成的矩阵进行标准化,      最终组成n×k1维的特征矩阵F7)对F中的每一行作为一个k1维的样本,共n个样本,用输入的聚类方法进行聚类,聚类维数为k28)得到簇划分(c1,...,ck2)                                        
一般情况下, k 1 k_1 k1是等于 k 2 k_2 k2

谱聚类思想

​ 谱聚类的思想来源于图论,它把待聚类的数据集中的每一个样本看做是图中一个顶点,这些顶点连接在一起,连接的这些边上有权重,权重的大小表示这些样本之间的相似程度。同一类的顶点它们的相似程度很高,在图论中体现为同一类的顶点中连接它们的边的权重很大,不在同一类的顶点连接它们的边的权重很小。于是谱聚类的最终目标就是找到一种切割图的方法,使得切割之后的各个子图内的权重很大,子图之间的权重很小。

Graph-based(带权重的无向图)
样本数据 :   X = ( x 1 , . . . , x N ) ⊤ 无向图 :   G = { V , E } 顶点集 :   V = { 1 , 2 , . . . , N } ⇔ X 边集 :   E : s i m i l a r i t y    m a t r i x ( a f f i m t y    m a t r i x ) W = [ w 11 w 12 . . . w 1 N w 21 w 22 . . . w 2 N . . . . . . . . . . . . w N 1 w N 2 . . . w N N ] = [ w i j ] , 1 ≤ i , j ≤ N 其中 w i j = { K ( x i , x j ) = e x p { − ∣ ∣ x i − x j ∣ ∣ 2 2 2 θ 2 } if  ( i , j ) ∈ E 0 if  ( i , j ) ∉ E 顶点 i 的度 :   d i = ∑ j = 1 N w i j 度矩阵 : D = d i a g ( W 1 N ) = [ d 1 0 . . . 0 0 d 2 . . . 0 . . . . . . . . . . . . 0 0 . . . d N ] = [ ∑ j = 1 N w 1 j 0 . . . 0 0 ∑ j = 1 N w 2 j . . . 0 . . . . . . . . . . . . 0 0 . . . ∑ j = 1 N w N j ] L a p l a c i a n   M a t r i x : L = D − W 样本数据: \ X=(x_1,...,x_N)^\top\\ \\ 无向图:\ G=\{V,E\}\\ \\ 顶点集:\ V=\{1,2,...,N\}⇔X\\ \\ 边集:\ E:similarity\ \ matrix(affimty\ \ matrix)\\ \\ W= \begin{bmatrix} w_{11} & w_{12} & ... & w_{1N} \\ w_{21} & w_{22} & ... & w_{2N}\\ ... & ... & ... & ...\\ w_{N1} & w_{N2} & ... & w_{NN} \end{bmatrix} =[w_{ij}],1≤i,j≤N\\ \\ 其中w_{ij}= \begin{cases} K(x_i,x_j)=exp\{-\frac{||x_i-x_j||_2^2}{2\theta ^2} \} & \text{if } (i,j)∈E \\ \\ 0 & \text{if } (i,j)∉E \\ \end{cases}\\ \\ 顶点i的度:\ d_i=\sum_{j=1}^Nw_{ij}\\ \\ 度矩阵:D=diag(W\mathbf{1}_N)= \begin{bmatrix} d_1 & 0 & ... & 0 \\ 0 & d_2 & ... & 0\\ ... & ... & ... & ...\\ 0 & 0 & ... & d_N \end{bmatrix}= \begin{bmatrix} \sum_{j=1}^Nw_{1j} & 0 & ... & 0 \\ 0 & \sum_{j=1}^Nw_{2j} & ... & 0\\ ... & ... & ... & ...\\ 0 & 0 & ... & \sum_{j=1}^Nw_{Nj} \end{bmatrix}\\ \\ Laplacian \ Matrix:L=D-W 样本数据: X=(x1,...,xN)无向图: G={ V,E}顶点集: V={ 1,2,...,N}X边集: E:similarity  matrix(affimty  matrix)W= w11w21...wN1w12w22...wN2............w1Nw2N...wNN =[wij],1i,jN其中wij= K(xi,xj)=exp{ 2θ2∣∣xixj22}0if (i,j)Eif (i,j)/E顶点i的度: di=j=1Nwij度矩阵:D=diag(W1N)= d10...00d2...0............00...dN = j=1Nw1j0...00j=1Nw2j...0............00...j=1NwNj Laplacian Matrix:L=DW

定义:
A ⊂ V , B ⊂ V , A ∩ B = ∅ → W ( A , B ) = ∑ i ∈ A , j ∈ B w i j A \subset V,B \subset V,A\cap B=\emptyset\\ \\ →W(A,B)=\sum_{i∈A,j∈B} w_{ij} AV,BV,AB=W(A,B)=iA,jBwij
假如一共K个类别:
C u t ( V ) = C u t ( A 1 , . . . , A K ) = ∑ k = 1 K W ( A k , A ‾ k ) = ∑ k = 1 K W ( A k , V ) − ∑ k = 1 K W ( A k , A k ) Cut(V)=Cut(A_1,...,A_K)=\sum_{k=1}^K W(A_k,\overline A_k)=\sum_{k=1}^K W(A_k,V)-\sum_{k=1}^K W(A_k,A_k) Cut(V)=Cut(A1,...,AK)=k=1KW(Ak,Ak)=k=1KW(Ak,V)k=1KW(Ak,Ak)
目标:
m i n   { A k } k = 1 K N c u t ( V ) \underset{\{A_k\}_{k=1}^K}{min\ }Ncut(V) { Ak}k=1Kmin Ncut(V)
Ncut的定义:
c u t ( V ) = ∑ k = 1 K W ( A k , A ‾ k ) → N c u t = ∑ k = 1 K W ( A k , A ‾ k ) Δ Δ = d e g r e e ( A k ) = ∑ i ∈ A k d i       d i = ∑ j = 1 N w i j → N c u t = ∑ k = 1 K W ( A k , A ‾ k ) ∑ i ∈ A k d i       d i = ∑ j = 1 N w i j = ∑ k = 1 K W ( A k , V ) − W ( A k , A k ) ∑ i ∈ A k d i = ∑ k = 1 K W ( A k , V ) − W ( A k , A k ) ∑ i ∈ A k ∑ j = 1 N w i j cut(V)=\sum_{k=1}^KW(A_k,\overline A_k)\\ \\ \\ →Ncut=\sum_{k=1}^K\frac{W(A_k,\overline A_k)}{\Delta}\\ \\ \Delta=degree(A_k)=\sum_{i∈A_k}d_i\ \ \ \ \ d_i=\sum_{j=1}^Nw_{ij}\\ \\ \\ →Ncut=\sum_{k=1}^K\frac{W(A_k,\overline A_k)}{\sum_{i∈A_k}d_i}\ \ \ \ \ d_i=\sum_{j=1}^Nw_{ij}\\ \\ \\ =\sum_{k=1}^K\frac{W(A_k,V)-W(A_k,A_k)}{\sum_{i∈A_k}d_i}\\ \\ \\ =\sum_{k=1}^K\frac{W(A_k,V)-W(A_k,A_k)}{\sum_{i∈A_k}\sum_{j=1}^Nw_{ij}} cut(V)=k=1KW(Ak,Ak)Ncut=k=1KΔW(Ak,Ak)Δ=degree(Ak)=iAkdi     di=j=1NwijNcut=k=1KiAkdiW(Ak,Ak)     di=j=1Nwij=k=1KiAkdiW(Ak,V)W(Ak,Ak)=k=1KiAkj=1NwijW(Ak,V)W(Ak,Ak)
Model
m i n   { A k } k = 1 K N c u t ( V ) = m i n   { A k } k = 1 K ∑ k = 1 K W ( A k , A ‾ k ) ∑ i ∈ A k d i → { A k } k = 1 K = a r g m i n   { A k } k = 1 K N c u t ( V ) \underset{\{A_k\}_{k=1}^K}{min\ }Ncut(V)=\underset{\{A_k\}_{k=1}^K}{min\ }\sum_{k=1}^K\frac{W(A_k,\overline A_k)}{\sum_{i∈A_k}d_i}\\ \\ \\ →\{A_k\}_{k=1}^K=arg \underset{\{A_k\}_{k=1}^K}{min\ }Ncut(V) { Ak}k=1Kmin Ncut(V)={ Ak}k=1Kmin k=1KiAkdiW(Ak,Ak){ Ak}k=1K=arg{ Ak}k=1Kmin Ncut(V)
indicator vector:
{ y i ∈ { 0 , 1 } K ∑ j = 1 K y i j = 1           y i = [ y i 1 y i 2 . . . y i K ]          y i j = 1 ⇔ 第  i 个样本属于第  j 个类别 \begin{cases} y_i ∈\{0,1\}^K \\ \\ \sum_{j=1}^Ky_{ij}=1 \\ \end{cases}\ \ \ \ \ \ \ \ \ y_i= \begin{bmatrix} y_{i1}\\ y_{i2}\\ ... \\ y_{iK} \end{bmatrix} \ \ \ \ \ \ \ \ \\ y_{ij}=1⇔第\ i个样本属于第\ j个类别 yi{ 0,1}Kj=1Kyij=1         yi= yi1yi2...yiK         yij=1 i个样本属于第 j个类别

Y = [ y 1 , . . . y K ] N × K ⊤ 将问题模型转换 : Y ^ = a r g m i n Y ^ N c u t ( V ) Y=[y_1,...y_K]^\top_{N×K}\\ \\ 将问题模型转换: \hat Y=arg \underset{\hat Y}{min}Ncut(V) Y=[y1,...yK]N×K将问题模型转换:Y^=argY^minNcut(V)

将问题模型转换: Y ^ = a r g m i n Y ^ N c u t ( V ) \hat Y=arg \underset{\hat Y}{min}Ncut(V) Y^=argY^minNcut(V)

将Ncut转换成矩阵的形式:
N c u t = ∑ k = 1 K W ( A k , A ‾ k ) ∑ i ∈ A k d i = T r [ W ( A 1 , A ‾ 1 ) ∑ i ∈ A 1 d i 0 . . . 0 0 W ( A 2 , A ‾ 2 ) ∑ i ∈ A 2 d i . . . 0 . . . . . . . . . . . . 0 0 . . . W ( A K , A ‾ K ) ∑ i ∈ A K d i ] = T r [ W ( A 1 , A ‾ 1 ) 0 . . . 0 0 W ( A 2 , A ‾ 2 ) . . . 0 . . . . . . . . . . . . 0 0 . . . W ( A K , A ‾ K ) ] [ ∑ i ∈ A 1 d i 0 . . . 0 0 ∑ i ∈ A 2 d i . . . 0 . . . . . . . . . . . . 0 0 . . . ∑ i ∈ A K d i ] − 1 Ncut=\sum_{k=1}^K\frac{W(A_k,\overline A_k)}{\sum_{i∈A_k}d_i}\\ \\ \\ =Tr \begin{bmatrix} \frac{W(A_1,\overline A_1)}{\sum_{i∈A_1}d_i} & 0 & ... & 0 \\ 0 & \frac{W(A_2,\overline A_2)}{\sum_{i∈A_2}d_i} & ... & 0\\ ... & ... & ... & ...\\ 0 & 0 & ... & \frac{W(A_K,\overline A_K)}{\sum_{i∈A_K}d_i} \end{bmatrix}\\ \\ \\ =Tr \begin{bmatrix} W(A_1,\overline A_1) & 0 & ... & 0 \\ 0 & W(A_2,\overline A_2) & ... & 0\\ ... & ... & ... & ...\\ 0 & 0 & ... & W(A_K,\overline A_K) \end{bmatrix} \begin{bmatrix} \sum_{i∈A_1}d_i & 0 & ... & 0 \\ 0 & \sum_{i∈A_2}d_i & ... & 0\\ ... & ... & ... & ...\\ 0 & 0 & ... & \sum_{i∈A_K}d_i \end{bmatrix}^{-1} Ncut=k=1KiAkdiW(Ak,Ak)=Tr iA1diW(A1,A1)0...00iA2diW(A2,A2)...0............00...iAKdiW(AK,AK) =Tr W(A1,A1)0...00W(A2,A2)...0............00...W(AK,AK) iA1di0...00iA2di...0............00...iAKdi 1

记 O K × K = [ W ( A 1 , A ‾ 1 ) 0 . . . 0 0 W ( A 2 , A ‾ 2 ) . . . 0 . . . . . . . . . . . . 0 0 . . . W ( A K , A ‾ K ) ] P K × K = [ ∑ i ∈ A 1 d i 0 . . . 0 0 ∑ i ∈ A 2 d i . . . 0 . . . . . . . . . . . . 0 0 . . . ∑ i ∈ A K d i ] 记O_{K×K}= \begin{bmatrix} W(A_1,\overline A_1) & 0 & ... & 0 \\ 0 & W(A_2,\overline A_2) & ... & 0\\ ... & ... & ... & ...\\ 0 & 0 & ... & W(A_K,\overline A_K) \end{bmatrix}\\ \\ \\ P_{K×K}= \begin{bmatrix} \sum_{i∈A_1}d_i & 0 & ... & 0 \\ 0 & \sum_{i∈A_2}d_i & ... & 0\\ ... & ... & ... & ...\\ 0 & 0 & ... & \sum_{i∈A_K}d_i \end{bmatrix} OK×K= W(A1,A1)0...00W(A2,A2)...0............00...W(AK,AK) PK×K= iA1di0...00iA2di...0............00...iAKdi

现在问题转化为: N c u t ( V ) = T r ( O P − 1 ) Ncut(V)=Tr(OP^{-1}) Ncut(V)=Tr(OP1)

已知W、Y,求O、P:

先求解P:
Y ⊤ Y = [ y 1 , . . . y N ] [ y 1 T y 2 T . . . y N T ] = ∑ i = 1 N y i y i T = [ N 1 0 . . . 0 0 N 2 . . . 0 . . . . . . . . . . . . 0 0 . . . N K ] K × K = [ ∑ i ∈ A 1 1 0 . . . 0 0 ∑ i ∈ A 2 1 . . . 0 . . . . . . . . . . . . 0 0 . . . ∑ i ∈ A K 1 ] K × K Y^\top Y=[y_1,...y_N] \begin{bmatrix} y_{1}^T\\ y_{2}^T\\ ... \\ y_{N}^T \end{bmatrix} =\sum_{i=1}^Ny_iy_i^T= \begin{bmatrix} N_1 & 0 & ... & 0 \\ 0 & N_2 & ... & 0\\ ... & ... & ... & ...\\ 0 & 0 & ... & N_K \end{bmatrix}_{K×K}=\\ \\ \\ \begin{bmatrix} \sum_{i∈A_1}1 & 0 & ... & 0 \\ 0 & \sum_{i∈A_2}1 & ... & 0\\ ... & ... & ... & ...\\ 0 & 0 & ... & \sum_{i∈A_K}1 \end{bmatrix}_{K×K} YY=[y1,...yN] y1Ty2T...yNT =i=1NyiyiT= N10...00N2...0............00...NK K×K= iA110...00iA21...0............00...iAK1 K×K
N k N_k Nk 的含义:在N个样本中,属于类别k的样本个数。 ∑ k = 1 N N k = N , N k = ∣ A k ∣ = ∑ i ∈ A k 1 \sum_{k=1}^NN_k=N,N_k=|A_k|=\sum_{i∈A_k}1 k=1NNk=N,Nk=Ak=iAk1
∑ i = 1 N y i d i y i T = y 1 d 1 y 1 T + y 2 d 2 y 2 T . . . + y N d N y N T = Y T D Y P K × K = [ ∑ i ∈ A 1 d i 0 . . . 0 0 ∑ i ∈ A 2 d i . . . 0 . . . . . . . . . . . . 0 0 . . . ∑ i ∈ A K d i ] = Y T D Y 其中 , D = [ d 1 0 . . . 0 0 d 2 . . . 0 . . . . . . . . . . . . 0 0 . . . d N ] = d i a g ( W 1 N ) = [ ∑ j = 1 N w 1 j 0 . . . 0 0 ∑ j = 1 N w 2 j . . . 0 . . . . . . . . . . . . 0 0 . . . ∑ j = 1 N w N j ] \sum_{i=1}^Ny_id_iy_i^T=y_1d_1y_1^T+y_2d_2y_2^T...+y_Nd_Ny_N^T=Y^TDY \\ \\ \\ P_{K×K}= \begin{bmatrix} \sum_{i∈A_1}d_i & 0 & ... & 0 \\ 0 & \sum_{i∈A_2}d_i & ... & 0\\ ... & ... & ... & ...\\ 0 & 0 & ... & \sum_{i∈A_K}d_i \end{bmatrix}=Y^TDY\\ \\ \\ 其中,D= \begin{bmatrix} d_1 & 0 & ... & 0 \\ 0 & d_2 & ... & 0\\ ... & ... & ... & ...\\ 0 & 0 & ... & d_N \end{bmatrix}=diag(W\mathbf{1}_N)= \begin{bmatrix} \sum_{j=1}^Nw_{1j} & 0 & ... & 0 \\ 0 & \sum_{j=1}^Nw_{2j} & ... & 0\\ ... & ... & ... & ...\\ 0 & 0 & ... & \sum_{j=1}^Nw_{Nj} \end{bmatrix}\\ \\ \\ i=1NyidiyiT=y1d1y1T+y2d2y2T...+yNdNyNT=YTDYPK×K= iA1di0...00iA2di...0............00...iAKdi =YTDY其中,D= d10...00d2...0............00...dN =diag(W1N)= j=1Nw1j0...00j=1Nw2j...0............00...j=1NwNj
所以我们求解的P为:
P = Y T D Y 其中 , D = d i a g ( W 1 N ) P=Y^TDY\\ \\ \\ 其中,D=diag(W\mathbf{1}_N) P=YTDY其中,D=diag(W1N)
再求解O:
O K × K = [ W ( A 1 , A ‾ 1 ) 0 . . . 0 0 W ( A 2 , A ‾ 2 ) . . . 0 . . . . . . . . . . . . 0 0 . . . W ( A K , A ‾ K ) ] W ( A k , A k ‾ ) = W ( A k , V ) ⏟ ∑ i ∈ A k d i − W ( A k , A k ) ⏟ ∑ i ∈ A k ∑ j ∈ A k w i j → O = [ ∑ i ∈ A 1 d i 0 . . . 0 0 ∑ i ∈ A 2 d i . . . 0 . . . . . . . . . . . . 0 0 . . . ∑ i ∈ A K d i ] − [ W ( A 1 , A 1 ) 0 . . . 0 0 W ( A 2 , A 2 ) . . . 0 . . . . . . . . . . . . 0 0 . . . W ( A K , A K ) ] O_{K×K}= \begin{bmatrix} W(A_1,\overline A_1) & 0 & ... & 0 \\ 0 & W(A_2,\overline A_2) & ... & 0\\ ... & ... & ... & ...\\ 0 & 0 & ... & W(A_K,\overline A_K) \end{bmatrix}\\ \\ \\ W(A_k,\overline{A_k})=\underbrace{W(A_k,V)}_{\sum_{i∈A_k}d_i}-\underbrace{W(A_k,A_k)}_{\sum_{i∈A_k}\sum_{j∈A_k}w_{ij}}\\ \\ \\ →O= \begin{bmatrix} \sum_{i∈A_1}d_i & 0 & ... & 0 \\ 0 & \sum_{i∈A_2}d_i & ... & 0\\ ... & ... & ... & ...\\ 0 & 0 & ... & \sum_{i∈A_K}d_i \end{bmatrix}- \begin{bmatrix} W(A_1,A_1) & 0 & ... & 0 \\ 0 & W(A_2, A_2) & ... & 0\\ ... & ... & ... & ...\\ 0 & 0 & ... & W(A_K, A_K) \end{bmatrix}\\ \\ \\ OK×K= W(A1,A1)0...00W(A2,A2)...0............00...W(AK,AK) W(Ak,Ak)=iAkdi W(Ak,V)iAkjAkwij W(Ak,Ak)O= iA1di0...00iA2di...0............00...iAKdi W(A1,A1)0...00W(A2,A2)...0............00...W(AK,AK)
前面的矩阵我们知道: Y T D Y Y^TDY YTDY, 再来看后面部分:
[ W ( A 1 , A 1 ) 0 . . . 0 0 W ( A 2 , A 2 ) . . . 0 . . . . . . . . . . . . 0 0 . . . W ( A K , A K ) ] \begin{bmatrix} W(A_1,A_1) & 0 & ... & 0 \\ 0 & W(A_2, A_2) & ... & 0\\ ... & ... & ... & ...\\ 0 & 0 & ... & W(A_K, A_K) \end{bmatrix} W(A1,A1)0...00W(A2,A2)...0............00...W(AK,AK)

= [ ∑ i ∈ A 1 ∑ j ∈ A 1 w i j 0 . . . 0 0 ∑ i ∈ A 2 ∑ j ∈ A 2 w i j . . . 0 . . . . . . . . . . . . 0 . . . . . . ∑ i ∈ A K ∑ j ∈ A K w i j ] =\begin{bmatrix} \sum_{i∈A_1}\sum_{j∈A_1}w_{ij} & 0 & ... & 0 \\ 0 & \sum_{i∈A_2}\sum_{j∈A_2}w_{ij} & ... & 0\\ ... & ... & ... & ...\\ 0 & ... & ...& \sum_{i∈A_K}\sum_{j∈A_K}w_{ij} \end{bmatrix} = iA1jA1wij0...00iA2jA2wij..................00...iAKjAKwij

猜想后半部分是否等于 Y T W Y Y^TWY YTWY 验证一下:
Y T W Y 维度是 K × K 维 Y T W Y = [ y 1 , . . . y N ] [ w 11 w 12 . . . w 1 N w 21 w 22 . . . w 2 N . . . . . . . . . . . . w N 1 w N 2 . . . w N N ] [ y 1 T y 2 T . . . y N T ] = [ ∑ i = 1 N y i w i 1 , . . . , ∑ i = 1 N y i w N i ] [ y 1 T y 2 T . . . y N T ] = ∑ i = 1 N ∑ j = 1 N y i w i j y i T = ∑ i = 1 N ∑ j = 1 N y i y i T w i j = [ ∑ i ∈ A 1 ∑ j ∈ A 1 w i j ∑ i ∈ A 1 ∑ j ∈ A 2 w i j . . . ∑ i ∈ A 1 ∑ j ∈ A K w i j ∑ i ∈ A 2 ∑ j ∈ A 1 w i j ∑ i ∈ A 2 ∑ j ∈ A 2 w i j . . . ∑ i ∈ A 2 ∑ j ∈ A K w i j . . . . . . . . . . . . ∑ i ∈ A K ∑ j ∈ A 1 w i j ∑ i ∈ A K ∑ j ∈ A 2 w i j . . . ∑ i ∈ A K ∑ j ∈ A K w i j ] Y^TWY维度是K×K维\\ \\ \\ Y^TWY=[y_1,...y_N] \begin{bmatrix} w_{11} & w_{12} & ... & w_{1N} \\ w_{21} & w_{22} & ... & w_{2N}\\ ... & ... & ... & ...\\ w_{N1} & w_{N2} & ... & w_{NN} \end{bmatrix} \begin{bmatrix} y_{1}^T\\ y_{2}^T\\ ... \\ y_{N}^T \end{bmatrix}\\ \\ =[\sum_{i=1}^Ny_iw_{i1},...,\sum_{i=1}^Ny_iw_{Ni}] \begin{bmatrix} y_{1}^T\\ y_{2}^T\\ ... \\ y_{N}^T \end{bmatrix}\\ \\ =\sum_{i=1}^N\sum_{j=1}^Ny_iw_{ij}y_i^T=\sum_{i=1}^N\sum_{j=1}^Ny_iy_i^Tw_{ij}\\ \\ =\begin{bmatrix} \sum_{i∈A_1}\sum_{j∈A_1}w_{ij} & \sum_{i∈A_1}\sum_{j∈A_2}w_{ij} & ... & \sum_{i∈A_1}\sum_{j∈A_K}w_{ij} \\ \sum_{i∈A_2}\sum_{j∈A_1}w_{ij} & \sum_{i∈A_2}\sum_{j∈A_2}w_{ij} & ... & \sum_{i∈A_2}\sum_{j∈A_K}w_{ij}\\ ... & ... & ... & ...\\ \sum_{i∈A_K}\sum_{j∈A_1}w_{ij} & \sum_{i∈A_K}\sum_{j∈A_2}w_{ij} & ... & \sum_{i∈A_K}\sum_{j∈A_K}w_{ij} \end{bmatrix} YTWY维度是K×KYTWY=[y1,...yN] w11w21...wN1w12w22...wN2............w1Nw2N...wNN y1Ty2T...yNT =[i=1Nyiwi1,...,i=1NyiwNi] y1Ty2T...yNT =i=1Nj=1NyiwijyiT=i=1Nj=1NyiyiTwij= iA1jA1wijiA2jA1wij...iAKjA1wijiA1jA2wijiA2jA2wij...iAKjA2wij............iA1jAKwijiA2jAKwij...iAKjAKwij
观察上式和O的后半部分:
[ ∑ i ∈ A 1 ∑ j ∈ A 1 w i j 0 . . . 0 0 ∑ i ∈ A 2 ∑ j ∈ A 2 w i j . . . 0 . . . . . . . . . . . . 0 . . . . . . ∑ i ∈ A K ∑ j ∈ A K w i j ] \begin{bmatrix} \sum_{i∈A_1}\sum_{j∈A_1}w_{ij} & 0 & ... & 0 \\ 0 & \sum_{i∈A_2}\sum_{j∈A_2}w_{ij} & ... & 0\\ ... & ... & ... & ...\\ 0 & ... & ...& \sum_{i∈A_K}\sum_{j∈A_K}w_{ij} \end{bmatrix} \\ \\ \\ iA1jA1wij0...00iA2jA2wij..................00...iAKjAKwij

[ ∑ i ∈ A 1 ∑ j ∈ A 1 w i j ∑ i ∈ A 1 ∑ j ∈ A 2 w i j . . . ∑ i ∈ A 1 ∑ j ∈ A K w i j ∑ i ∈ A 2 ∑ j ∈ A 1 w i j ∑ i ∈ A 2 ∑ j ∈ A 2 w i j . . . ∑ i ∈ A 2 ∑ j ∈ A K w i j . . . . . . . . . . . . ∑ i ∈ A K ∑ j ∈ A 1 w i j ∑ i ∈ A K ∑ j ∈ A 2 w i j . . . ∑ i ∈ A K ∑ j ∈ A K w i j ] \begin{bmatrix} \sum_{i∈A_1}\sum_{j∈A_1}w_{ij} & \sum_{i∈A_1}\sum_{j∈A_2}w_{ij} & ... & \sum_{i∈A_1}\sum_{j∈A_K}w_{ij} \\ \sum_{i∈A_2}\sum_{j∈A_1}w_{ij} & \sum_{i∈A_2}\sum_{j∈A_2}w_{ij} & ... & \sum_{i∈A_2}\sum_{j∈A_K}w_{ij}\\ ... & ... & ... & ...\\ \sum_{i∈A_K}\sum_{j∈A_1}w_{ij} & \sum_{i∈A_K}\sum_{j∈A_2}w_{ij} & ... & \sum_{i∈A_K}\sum_{j∈A_K}w_{ij} \end{bmatrix} iA1jA1wijiA2jA1wij...iAKjA1wijiA1jA2wijiA2jA2wij...iAKjA2wij............iA1jAKwijiA2jAKwij...iAKjAKwij

我们发现,这两个矩阵对角线元素是相同的,又因为我们是对迹求最小,即只考虑对角线的元素。所以将O的后半部分换成$Y^TWY $并不影响我们的结果

O ′ = Y T D Y − Y T W Y O'=Y^TDY - Y^TWY O=YTDYYTWY 那么 O ′ P O'P OP相当于对 O ′ O' O的对角线做一些变化。那么就有 T r ( O P ) = T r ( O ′ P ) Tr(OP)=Tr(O'P) Tr(OP)=Tr(OP)

至此我们解出了 O O O ,并且提出了用 O ′ O' O 代替 O O O 可以达到同样的目的。
O ′ = Y T D Y − Y T W Y O'=Y^TDY-Y^TWY O=YTDYYTWY

我们最终的优化问题变为:
Y ^ = a r g m i n Y ^   T r ( Y T ( D − W ) Y ( Y T D Y ) − 1 ) = Y ^ = a r g m i n Y ^   T r ( Y T L Y ( Y T D Y ) − 1 ) 这里 L = D − W 是拉普拉斯矩阵 \hat Y=arg \underset{\hat Y}{min}\ Tr(Y^T(D-W)Y(Y^TDY)^{-1})\\ \\ \\ =\hat Y=arg \underset{\hat Y}{min}\ Tr(Y^TLY(Y^TDY)^{-1})\\ \\ 这里L=D-W是拉普拉斯矩阵 Y^=argY^min Tr(YT(DW)Y(YTDY)1)=Y^=argY^min Tr(YTLY(YTDY)1)这里L=DW是拉普拉斯矩阵

To minimaze T r ( Y T L Y ( Y T D Y ) − 1 ) Tr(Y^T LY(Y^T DY)^{-1}) Tr(YTLY(YTDY)1)

T r ( Y T L Y ( Y T D Y ) − 1 ) Tr(Y^T LY(Y^T DY)^{-1}) Tr(YTLY(YTDY)1)

其中 Y ∈ R N × K Y∈R^{N×K} YRN×K ,每一行是ONE-HOT,表示第i行属于哪一类。 Y Y Y 形如:
[ 0 . . . 0 1 0 . . . 0 0 . . . 1 0 0 . . . 0 . . . . . . . . . . . . . . . . . . . . . 0 . . . 0 0 0 . . . 1 1 . . . 0 1 0 . . . 0 ] \begin{bmatrix} 0 & ...& 0 & 1 & 0 &... & 0 \\ 0 & ...&1 & 0 & 0 &... & 0\\ ... & ...& ... & ... & ... & ... & ...\\ 0 & ...& 0 & 0 & 0 &... & 1\\ 1 & ...&0 & 1 & 0 &... & 0 \end{bmatrix} 00...01...............01...0010...0100...00...............00...10
记:
P = Y T D Y = d i a g ( ∑ i ∈ A 1 d i , ∑ i ∈ A 2 d i , . . . , ∑ i ∈ A K d i ) = d i a g ( p 1 , p 2 , . . . , p k ) 原式 = T r ( Y T L Y P − 1 ) = T r ( Y T L Y P − 1 2 P − 1 2 ) = T r ( P − 1 2 Y T L Y P − 1 2 ) P=Y^TDY=diag(\sum_{i∈A_1}d_i,\sum_{i∈A_2}d_i,...,\sum_{i∈A_K}d_i)=diag(p_1,p_2,...,p_k)\\ \\ 原式=Tr(Y^TLYP^{-1})=Tr(Y^TLYP^{-\frac12}P^{-\frac12})=Tr(P^{-\frac12}Y^TLYP^{-\frac12}) P=YTDY=diag(iA1di,iA2di,...,iAKdi)=diag(p1,p2,...,pk)原式=Tr(YTLYP1)=Tr(YTLYP21P21)=Tr(P21YTLYP21)
记:
H = Y P − 1 2 , H T = P − 1 2 Y T H T H = P − 1 2 Y T Y P − 1 2 = P − 1 2 I P − 1 2 = P − 1 H=YP^{-\frac12},H^T=P^{-\frac12}Y^T\\ \\ H^TH=P^{-\frac12}Y^TYP^{-\frac12}=P^{-\frac12}IP^{-\frac12}=P^{-1} H=YP21,HT=P21YTHTH=P21YTYP21=P21IP21=P1

原式 = T r ( H T L H ) 原式=Tr(H^TLH) 原式=Tr(HTLH)
定理1:

对于半正定矩阵L, 特征值(eigenvalue): 0 ≤ λ 1 ≤ λ 2 ≤ . . . ≤ λ n 0≤\lambda_1≤\lambda_2≤...≤\lambda_n 0λ1λ2...λn

特征基(eigbasis): { v ‾ 1 , v ‾ 2 , . . . , v ‾ n } \{\overline v_1,\overline v_2,...,\overline v_n\} { v1,v2,...,vn} →Orthonormal,标准正交化之后的特征向量

x ∈ R N , a n d    x T x = 1 \mathbf{x}∈R^{N},and\ \ \mathbf{x}^T\mathbf{x}=\mathbf{1} xRN,and  xTx=1 时, x T L x \mathbf{x}^TL\mathbf{x} xTLx 的最小值在 x = v ‾ 1 \mathbf{x}=\overline v_1 x=v1 时取到。

proof:
x 可以用 e i g b a s i s 表示 ,  因为 e i g b a s i s 是 o r t h o n o r m a l x = c 1 v ‾ 1 + c 2 v ‾ 2 + . . . + c n v ‾ n L x = λ x = c 1 λ 1 v ‾ 1 + c 2 λ 2 v ‾ 2 + . . . + c n λ n v ‾ n → x T L x = c 1 2 λ 1 + c 2 2 λ 2 + . . . + c n 2 λ n 因为 x T x = 1 → c 1 2 + c 2 2 + . . . + c n 2 → x T L x = c 1 2 λ 1 + c 2 2 λ 2 + . . . + c n 2 λ n ≥ λ 1 当 c 1 2 = 1 , c i = 0 , i ≠ 1 时等号成立 ⇔ x = v ‾ 1    o r    x = − v ‾ 1 \mathbf{x}可以用eigbasis表示, \ 因为eigbasis是orthonormal\\ \\ \mathbf{x}=c_1\overline v_1+c_2\overline v_2+...+c_n\overline v_n\\ \\ L\mathbf{x}=\lambda \mathbf{x}=c_1\lambda_1\overline v_1+c_2\lambda_2\overline v_2+...+c_n\lambda_n\overline v_n\\ \\ →\mathbf{x}^TL\mathbf{x}=c_1^2\lambda_1+c_2^2\lambda_2+...+c_n^2\lambda_n\\ \\ 因为\mathbf{x}^T\mathbf{x}=\mathbf{1}→c_1^2+c_2^2+...+c_n^2\\ \\ →\mathbf{x}^TL\mathbf{x}=c_1^2\lambda_1+c_2^2\lambda_2+...+c_n^2\lambda_n≥\lambda_1\\ 当c_1^2=1,c_i=0,i≠1时等号成立⇔\mathbf{x}=\overline v_1\ \ or\ \ \mathbf{x}=-\overline v_1 x可以用eigbasis表示, 因为eigbasisorthonormalx=c1v1+c2v2+...+cnvnLx=λx=c1λ1v1+c2λ2v2+...+cnλnvnxTLx=c12λ1+c22λ2+...+cn2λn因为xTx=1c12+c22+...+cn2xTLx=c12λ1+c22λ2+...+cn2λnλ1c12=1,ci=0,i=1时等号成立x=v1  or  x=v1
定理2:

对于半正定矩阵L, 特征值(eigenvalue): 0 ≤ λ 1 ≤ λ 2 ≤ . . . ≤ λ n 0≤\lambda_1≤\lambda_2≤...≤\lambda_n 0λ1λ2...λn

特征基(eigbasis): { v ‾ 1 , v ‾ 2 , . . . , v ‾ n } \{\overline v_1,\overline v_2,...,\overline v_n\} { v1,v2,...,vn} →Orthonormal,标准正交化之后的特征向量

F ∈ R N × K ,   a n d   F T F = I F∈R^{N×K},\ and\ F^TF=I FRN×K, and FTF=I 时, T r ( F T L F ) Tr(F^TLF) Tr(FTLF) 的最小值在 F = [ v ‾ 1 , v ‾ 2 , . . . , v ‾ K ] F=[\overline v_1,\overline v_2,...,\overline v_K] F=[v1,v2,...,vK] 时取到

proof:
D e n o t e     F = [ f 1 , f 2 , . . . , f K ] T r ( F T L F ) = ∑ i = 1 K f i T L f i 由于定理 2     f 1 = v ‾ 1 , f 2 = v ‾ 2 , . . . , f n = v ‾ n 时 , T r ( F T L F ) 最小 Denote\ \ \ F=[f_1,f_2,...,f_K]\\ \\ Tr(F^TLF)=\sum_{i=1}^Kf_i^TLf_i\\ \\ 由于定理2\ \ \ f_1=\overline v_1 , f_2=\overline v_2,...,f_n=\overline v_n 时,Tr(F^TLF)最小 Denote   F=[f1,f2,...,fK]Tr(FTLF)=i=1KfiTLfi由于定理2   f1=v1,f2=v2,...,fn=vn,Tr(FTLF)最小
因为 F T F = I F^TF=I FTF=I,所以F是orthonormal matrix,故不能每列都是 v ‾ 1 \overline v_1 v1

原始优化问题 T r ( H T L H ) Tr(H^TLH) Tr(HTLH) 并没有 H T H = I H^TH=I HTH=I的性质,无法用定理2,于是对H做一些变换。

H T D H = P − 1 2 Y T D Y P − 1 2 = P − 1 2 P P − 1 2 = I 记 F = D 1 2 H → F T F = ( D 1 2 H ) T D 1 2 H = H T D 1 2 D 1 2 H = H T D H = I 则 H = D − 1 2 F → T r ( H T L H ) = T r ( F T D − 1 2 L D − 1 2 F ) ,      F T F = I H^TDH=P^{-\frac12}Y^TDYP^{-\frac12}=P^{-\frac12}PP^{-\frac12}=I\\ 记F=D^{\frac12}H→F^TF=(D^{\frac12}H)^TD^{\frac12}H=H^TD^{\frac12}D^{\frac12}H=H^TDH=I\\ \\ 则H=D^{-\frac12}F\\ \\ →Tr(H^TLH)=Tr(F^TD^{-\frac12}LD^{-\frac12}F),\ \ \ \ F^TF=I HTDH=P21YTDYP21=P21PP21=IF=D21HFTF=(D21H)TD21H=HTD21D21H=HTDH=IH=D21FTr(HTLH)=Tr(FTD21LD21F),    FTF=I
至此我们得到最终的优化目标:
T r ( F T D − 1 2 L D − 1 2 F ) ,      F T F = I Tr(F^TD^{-\frac12}LD^{-\frac12}F),\ \ \ \ F^TF=I Tr(FTD21LD21F),    FTF=I
在解出的F上再做一次k-means,最终求得Y

最近更新

  1. docker php8.1+nginx base 镜像 dockerfile 配置

    2023-12-13 09:12:05       94 阅读
  2. Could not load dynamic library ‘cudart64_100.dll‘

    2023-12-13 09:12:05       100 阅读
  3. 在Django里面运行非项目文件

    2023-12-13 09:12:05       82 阅读
  4. Python语言-面向对象

    2023-12-13 09:12:05       91 阅读

热门阅读

  1. MySQL和 Oracle查看表信息

    2023-12-13 09:12:05       55 阅读
  2. K8S 常用命令

    2023-12-13 09:12:05       55 阅读
  3. 线程按顺序循环执行

    2023-12-13 09:12:05       53 阅读
  4. ElasticSearch之cat thread pool API

    2023-12-13 09:12:05       64 阅读
  5. secrets --- 生成管理密码的安全随机数

    2023-12-13 09:12:05       57 阅读
  6. 数据分析Pandas

    2023-12-13 09:12:05       48 阅读
  7. Jenkins 设置中文

    2023-12-13 09:12:05       56 阅读
  8. 《C++新经典设计模式》之第7章 单例模式

    2023-12-13 09:12:05       45 阅读
  9. Go 语言开发工具

    2023-12-13 09:12:05       61 阅读
  10. 云计算在数据处理中的应用

    2023-12-13 09:12:05       65 阅读