谱聚类

在多元变量统计中，谱聚类（英语：spectral clustering）技术利用数据相似矩阵的谱（特征值），在对数据进行降维后，以较少的维度进行聚类。相似矩阵作为输入，提供了对数据集中每一对点相对相似性的定量评估。

在图像分割中，谱聚类被称为基于分割的物体分类。

算法

基本算法

计算拉普拉斯矩阵 $L$ （或归一化的拉普拉斯矩阵）
计算前 $k$ 个特征向量（这些特征向量对应 $L$ 的 $k$ 个最小的特征值）
考虑由这 $k$ 个特征向量组成的矩阵，矩阵的第 $l$ 行定义了图节点 $l$ 的特征
根据这些特征对图节点进行聚类（例如使用k-均值聚类）

大型图的（归一化）拉普拉斯矩阵通常是病态的（ill-conditioned，即高条件数），这会减缓迭代求解的收敛速度。预处理（英语：Preconditioner#Preconditioning_for_eigenvalue_problems）（Preconditioning）可以加速收敛。通过首先确定结构，然后对群落进行聚类，谱聚类可以成功应用于大型图。^[1]

谱聚类与非线性降维密切相关，局部线性嵌入（Locally-linear embedding）等降维技术可用于减少噪声或异常值的误差。^[2]

软件

有不少大型开源项目实现谱聚类，包括Scikit-learn^[3]（使用带有多网格预处理（multigrid method）或ARPACK的LOBPCG算法），MLlib（使用幂迭代法，Power iteration method）进行伪特征向量聚类^[4]，以及R^[5]。

和其它聚类算法的关系

谱聚类算法它可以在核聚类方法的背景下进行描述，这显示了它与其他方法的相似之处。^[6]

和k-means聚类算法的关系

加权核K-means问题与谱聚类问题的目标函数相同，可以通过多级方法直接优化。

参考资料

^ Zare, Habil; P. Shooshtari; A. Gupta; R. Brinkman. Data reduction for spectral clustering to analyze high throughput flow cytometry data. BMC Bioinformatics. 2010, 11: 403. PMC 2923634 . PMID 20667133. doi:10.1186/1471-2105-11-403.
^ Arias-Castro, E. and Chen, G. and Lerman, G., Spectral clustering based on local linear approximations., Electronic Journal of Statistics, 2011, 5: 1537–1587, S2CID 88518155, arXiv:1001.1323 , doi:10.1214/11-ejs651
^ 2.3. Clustering. [2022-08-07]. （原始内容存档于2015-05-15）.
^ Clustering - RDD-based API - Spark 3.2.0 Documentation. [2022-08-07]. （原始内容存档于2017-07-03）.
^ Kernlab: Kernel-Based Machine Learning Lab. 12 November 2019 [2022-08-07]. （原始内容存档于2017-06-27）.
^ Filippone M., Camastra F., Masulli, F., Rovetta, S. A survey of kernel and spectral methods for clustering (PDF). Pattern Recognition. January 2008, 41 (1): 176–190 [2022-08-07]. Bibcode:2008PatRe..41..176F. doi:10.1016/j.patcog.2007.05.018. （原始内容存档 (PDF)于2022-08-16）.

[1] Zare, Habil; P. Shooshtari; A. Gupta; R. Brinkman. Data reduction for spectral clustering to analyze high throughput flow cytometry data. BMC Bioinformatics. 2010, 11: 403. PMC 2923634 . PMID 20667133. doi:10.1186/1471-2105-11-403.

[2] Arias-Castro, E. and Chen, G. and Lerman, G., Spectral clustering based on local linear approximations., Electronic Journal of Statistics, 2011, 5: 1537–1587, S2CID 88518155, arXiv:1001.1323 , doi:10.1214/11-ejs651

[3] 2.3. Clustering. [2022-08-07]. （原始内容存档于2015-05-15）.

[4] Clustering - RDD-based API - Spark 3.2.0 Documentation. [2022-08-07]. （原始内容存档于2017-07-03）.

[5] Kernlab: Kernel-Based Machine Learning Lab. 12 November 2019 [2022-08-07]. （原始内容存档于2017-06-27）.

[filippone2008survey-6] Filippone M., Camastra F., Masulli, F., Rovetta, S. A survey of kernel and spectral methods for clustering (PDF). Pattern Recognition. January 2008, 41 (1): 176–190 [2022-08-07]. Bibcode:2008PatRe..41..176F. doi:10.1016/j.patcog.2007.05.018. （原始内容存档 (PDF)于2022-08-16）.

[1]

[2]

[3]

[4]

[5]

[6]