Clustering¶
Multiview Spectral Clustering¶
- class mvlearn.cluster.MultiviewSpectralClustering(n_clusters=2, random_state=None, info_view=None, max_iter=10, n_init=10, affinity='rbf', gamma=None, n_neighbors=10)[source]¶
An implementation of multi-view spectral clustering.
An implementation of multi-view spectral clustering using the basic co-training framework as described in 1. Additionally, this can be effective when the dataset naturally contains features that are of 2 different data types, such as continuous features and categorical features 4, and then the original features are separated into two views in this way.
This algorithm can handle 2 or more views of data.
- Parameters
n_clusters (int) -- The number of clusters
random_state (int, optional, default=None) -- Determines random number generation for k-means.
info_view (int, optional, default=None) -- The most informative view. Must be between 0 and n_views-1 If given, then the final clustering will be performed on the designated view alone. Otherwise, the algorithm will concatenate across all views and cluster on the result.
max_iter (int, optional, default=10) -- The maximum number of iterations to run the clustering algorithm.
n_init (int, optional, default=10) -- The number of random initializations to use for k-means clustering.
affinity (string, optional, default='rbf') -- The affinity metric used to construct the affinity matrix. Options include 'rbf' (radial basis function), 'nearest_neighbors', and 'poly' (polynomial)
gamma (float, optional, default=None) -- Kernel coefficient for rbf and polynomial kernels. If None then gamma is computed as 1 / (2 * median(pair_wise_distances(X))^2) for each data view X.
n_neighbors (int, optional, default=10) -- Only used if nearest neighbors is selected for affinity. The number of neighbors to use for the nearest neighbors kernel.
- labels_¶
Cluster labels for each sample in the fitted data.
- Type
array-like, shape (n_samples)
- embedding_¶
The final spectral representation of the data to be used as input for the KMeans clustering step.
- Type
array-like, shape (n_samples, n_clusters)
Notes
Multi-view spectral clustering adapts the spectral clustering algorithm to applications where more than one view of data is available. This algorithm relies on the basic assumptions of the co-training, which are: (a) Sufficiency: each view is sufficient for classification on its own, (b) Compatibility: the target functions in both views predict the same labels for co-occurring features with high probability, and (c) Conditional independence: the views are conditionally independent given the class labels. In contrast to multi-view k-means clustering, multi-view spectral clustering performs well on arbitrary shaped clusters, and can therefore be readily used in applications where clusters are not expected to be convex. However multi-view spectral clustering tends to be computationally expensive unless the similarity graph for the data is sparse.
Multi-view spectral clustering works by using the spectral embedding from one view to constrain the similarity graph in the other view. By iteratively applying this procedure, the clustering of the two views tend to each other. Here we outline the algorithm for the Multi-view Spectral clustering algorithm for 2 views.
Multi-view Spectral Clustering Algorithm (for 2 views)
Input: Similarity matrix for both views: \(\mathbf{K}_1, \mathbf{K}_2\)
Output: Assignments to k clusters
Initialize: \(\mathbf{L}_v = \mathbf{D}_v^{-1/2} \mathbf{K}_v\mathbf{D}_v^{-1/2}\) for \(v = 1, 2\) \(\mathbf{U}_v^0\) is an \(n \times k\) matrix with the top k eigenvectors of \(\mathbf{L}_v\) for \(v = 1, 2\)
For \(i = 1\) to iter:
\(\mathbf{S}_1 = sym(\mathbf{U}_2^{i-1} {\mathbf{U}_2^{i-1}}^T\mathbf{K}_1)\)
\(\mathbf{S}_2 = sym(\mathbf{U}_1^{i-1} {\mathbf{U}_1^{i-1}}^T\mathbf{K}_2)\)
Use \(\mathbf{S}_1\) and \(\mathbf{S}_2\) as the new graph similarities and compute the Laplacians. Solve for the largest k eigenvectors to obtain \(\mathbf{U}_1^i\) and \(\mathbf{U}_2^i\).
Row-normalize \(\mathbf{U}_1^i\) and \(\mathbf{U}_2^i\).
Form matrix \(\mathbf{V} = \mathbf{U}_v^i\), where \(v\) is believed to be the most informative view a priori. If there is no prior knowledge on the view informativeness, matrix \(\mathbf{V}\) can also be set to the column-wise concatenation of the two \(\mathbf{U}_v^i\) s.
Assign example j to cluster c if the j-th row of \(\mathbf{V}\) is assigned to cluster c by the k-means algorithm.
References
- 1
Abhishek Kumar and Hal Daume. "A co-training approach for multi-view spectral clustering." In Proceedings of the 28th International Conference on Machine Learning, page 393–400. Omnipress, 2011.
Examples
>>> from mvlearn.datasets import load_UCImultifeature >>> from mvlearn.cluster import MultiviewSpectralClustering >>> from sklearn.metrics import normalized_mutual_info_score as nmi_score >>> # Get 5-class data >>> data, labels = load_UCImultifeature(select_labeled = list(range(5))) >>> mv_data = data[:2] # first 2 views only >>> mv_spectral = MultiviewSpectralClustering(n_clusters=5, ... random_state=10, n_init=100) >>> mv_clusters = mv_spectral.fit_predict(mv_data) >>> nmi = nmi_score(labels, mv_clusters) >>> print('{0:.3f}'.format(nmi)) 0.872
Co-Regularized Multiview Spectral Clustering¶
- class mvlearn.cluster.MultiviewCoRegSpectralClustering(n_clusters=2, v_lambda=2, random_state=None, info_view=None, max_iter=10, n_init=10, affinity='rbf', gamma=None, n_neighbors=10)[source]¶
An implementation of co-regularized multi-view spectral clustering based on an unsupervied version of the co-training framework. This algorithm uses the pairwise co-regularization scheme as described in 2. This algorithm can handle 2 or more views of data.
- Parameters
n_clusters (int) -- The number of clusters
v_lambda (float, optional, default=2) -- The regularization parameter. This parameter trades-off the spectral clustering objectives with the degree of agreement between each pair of views in the new representation. Must be a positive value.
random_state (int, optional, default=None) -- Determines random number generation for k-means.
info_view (int, optional, default=None) -- The most informative view. Must be between 0 and n_views-1 If given, then the final clustering will be performed on the designated view alone. Otherwise, the algorithm will concatenate across all views and cluster on the result.
max_iter (int, optional, default=10) -- The maximum number of iterations to run the clustering algorithm.
n_init (int, optional, default=10) -- The number of random initializations to use for k-means clustering.
affinity (string, optional, default='rbf') -- The affinity metric used to construct the affinity matrix. Options include 'rbf' (radial basis function), 'nearest_neighbors', and 'poly' (polynomial)
gamma (float, optional, default=None) -- Kernel coefficient for rbf and polynomial kernels. If None then gamma is computed as 1 / (2 * median(pair_wise_distances(X))^2) for each data view X.
n_neighbors (int, optional, default=10) -- Only used if nearest neighbors is selected for affinity. The number of neighbors to use for the nearest neighbors kernel.
- labels_¶
Cluster labels for each point.
- Type
array-like, shape (n_samples,)
- embedding_¶
The final spectral representation of the data to be used as input for the KMeans clustering step.
- Type
array-like, shape (n_samples, n_clusters)
- objective_¶
The value of the spectral clustering objective for each view at the end of each iteration.
- Type
array-like, shape (n_views, n_iterations)
Notes
In standard spectral clustering, the eigenvector matrix U for a given view is the new data representation to be used for the subsequent k-means clustering stage. In this algorithm, the objective function has been altered to encourage the pairwise similarities of examples under the new representation to be similar across all views.
The modified spectral clustering objective for the case of two views is shown and derived in [#4Clu]. In the clustering objective, the hyperparameter lambda trades-off the spectral clustering objectives and the disagreement term.
For a fixed lambda and n, the objective function is bounded from above and non-decreasing. As such, the algorithm is guaranteed to converge.
References
- 2
Abhishek Kumar, Piyush Rai, and Hal Daume. Co-regularized multi-view spectral cluster-ing. In Proceedings of the 24th International Conference on Neural Information Processing Systems, page 1413–1421. Curran Associates Inc., 2011.
Examples
>>> from mvlearn.datasets import load_UCImultifeature >>> from mvlearn.cluster import MultiviewCoRegSpectralClustering >>> from sklearn.metrics import normalized_mutual_info_score as nmi_score >>> # Get 5-class data >>> data, labels = load_UCImultifeature(select_labeled = list(range(5))) >>> mv_data = data[:2] # first 2 views only >>> mv_spectral = MultiviewCoRegSpectralClustering(n_clusters=5, ... random_state=10, n_init=100) >>> mv_clusters = mv_spectral.fit_predict(mv_data) >>> nmi = nmi_score(labels, mv_clusters, average_method='arithmetic') >>> print('{0:.3f}'.format(nmi)) 0.663
Multiview K Means¶
- class mvlearn.cluster.MultiviewKMeans(n_clusters=2, random_state=None, init='k-means++', patience=5, max_iter=300, n_init=5, tol=0.0001, n_jobs=None)[source]¶
This class implements multi-view k-means.
This class implements multi-view k-means using the co-EM framework as described in 3. This algorithm is most suitable for cases in which the different views of data are conditionally independent. Additionally, this can be effective when the dataset naturally contains features that are of 2 different data types, such as continuous features and categorical features 4, and then the original features are separated into two views in this way.
This algorithm currently handles two views of data.
- Parameters
n_clusters (int, optional, default=2) -- The number of clusters
random_state (int, optional, default=None) -- Determines random number generation for initializing centroids. Can seed the random number generator with an int.
init ({'k-means++', 'random'} or list of array-likes, default='k-means++') --
Method of initializing centroids.
'k-means++': selects initial cluster centers for k-means clustering via a method that speeds up convergence.
'random': choose n_cluster samples from the data for the initial centroids.
If a list of array-likes is passed, the list should have a length of equal to the number of views. Each of the array-likes should have the shape (n_clusters, n_features_i) for the ith view, where n_features_i is the number of features in the ith view of the input data.
patience (int, optional, default=5) -- The number of EM iterations with no decrease in the objective function after which the algorithm will terminate.
max_iter (int, optional, default=300) -- The maximum number of EM iterations to run before termination.
n_init (int, optional, default=5) -- Number of times the k-means algorithm will run on different centroid seeds. The final result will be the best output of n_init runs with respect to total inertia across all views.
tol (float, default=1e-4) -- Relative tolerance with regards to inertia to declare convergence.
n_jobs (int, default=None) -- The number of jobs to use for computation. This works by computing each of the n_init runs in parallel. None means 1. -1 means using all processors.
- labels_¶
Cluster labels for each sample in the fitted data.
- Type
array-like, shape (n_samples)
- centroids_¶
centroids_
length: n_viewscentroids_[i]
shape: (n_clusters, n_features_i)The cluster centroids for each of the two views.
centroids_[0]
corresponds to the centroids of view 1 andcentroids_[1]
corresponds to the centroids of view 2.- Type
list of array-likes
Notes
Multi-view k-means clustering adapts the traditional k-means clustering algorithm to handle two views of data. This algorithm requires that a conditional independence assumption between views holds true. In cases where both views are informative and conditionally independent, multi-view k-means clustering can outperform its single-view analog run on a concatenated version of the two views of data. This is quite useful for applications where you wish to cluster data from two different modalities or data with features that naturally fall into two different partitions. Multi-view k-means works by iteratively performing the maximization and expectation steps of traditional EM in one view, and then using the computed hidden variables as the input for the maximization step in the other view. This algorithm, referred to as Co-EM, is described below.
Co-EM Algorithm
Input: Unlabeled data D with 2 views
Initialize \(\Theta_0^{(2)}\), T, \(t = 0\).
E step for view 2: compute expectation for hidden variables given
Loop until stopping criterion is true:
For v = 1 ... 2:
\(t = t + 1\)
M step view v: Find model parameters \(\Theta_t^{(v)}\) that maximize the likelihood for the data given the expected values for hidden variables of view \(\overline{v}\) of iteration \(t\) - 1
E step view \(v\): compute expectation for hidden variables given the model parameters \(\Theta_t^{(v)}\)
return combined \(\hat{\Theta} = \Theta_{t-1}^{(1)} \cup \Theta_t^{(2)}\)
The final assignment of examples to partitions is performed by assigning each example to the cluster with the largest averaged posterior probability over both views.
References
- 3(1,2)
Steffen Bickel and Tobias Scheffer. Multi-view clustering. In Proceedings of the Fourth IEEE International Conference on Data Mining, page 19–26. IEEE Computer Society, 2004.
- 4(1,2,3)
Guoqing Chao, Shiliang Sun, and J. Bi. A survey on multi-view clustering.arXiv preprint, arXiv:1712.06246, 2017.
Examples
>>> from mvlearn.datasets import load_UCImultifeature >>> from mvlearn.cluster import MultiviewKMeans >>> from sklearn.metrics import normalized_mutual_info_score as nmi_score >>> # Get 5-class data >>> data, labels = load_UCImultifeature(select_labeled = list(range(5))) >>> mv_data = data[:2] # first 2 views only >>> mv_kmeans = MultiviewKMeans(n_clusters=5, random_state=10) >>> mv_clusters = mv_kmeans.fit_predict(mv_data) >>> nmi = nmi_score(labels, mv_clusters) >>> print('{0:.3f}'.format(nmi)) 0.770
""
Multiview Spherical K Means¶
- class mvlearn.cluster.MultiviewSphericalKMeans(n_clusters=2, random_state=None, init='k-means++', patience=5, max_iter=None, n_init=5, tol=0.0001, n_jobs=None)[source]¶
An implementation of multi-view spherical K-Means.
An implementation of multi-view spherical K-Means using the co-EM framework as described in 3. This algorithm is most suitable for cases in which the different views of data are conditionally independent. Additionally, this can be effective when the dataset naturally contains features that are of 2 different data types, such as continuous features and categorical features 4, and then the original features are separated into two views in this way.
This algorithm currently handles two views of data.
- Parameters
n_clusters (int, optional, default=2) -- The number of clusters
random_state (int, optional, default=None) -- Determines random number generation for initializing centroids. Can seed the random number generator with an int.
init ({'k-means++', 'random'} or list of array-likes, default='k-means++') --
Method of initializing centroids.
'k-means++': selects initial cluster centers for k-means clustering via a method that speeds up convergence.
'random': choose n_cluster samples from the data for the initial centroids.
If a list of array-likes is passed, the list should have a length of equal to the number of views. Each of the array-likes should have the shape (n_clusters, n_features_i) for the ith view, where n_features_i is the number of features in the ith view of the input data.
patience (int, optional, default=5) -- The number of EM iterations with no decrease in the objective function after which the algorithm will terminate.
max_iter (int, optional, default=None) -- The maximum number of EM iterations to run before termination.
n_init (int, optional, default=5) -- Number of times the k-means algorithm will run on different centroid seeds. The final result will be the best output of n_init runs with respect to total inertia across all views.
tol (float, default=1e-4) -- Relative tolerance with regards to inertia to declare convergence.
n_jobs (int, default=None) -- The number of jobs to use for computation. This works by computing each of the n_init runs in parallel. None means 1. -1 means using all processors.
- labels_¶
Cluster labels for each sample in the fitted data.
- Type
array-like, shape (n_samples)
- centroids_¶
centroids_
length: n_viewscentroids_[i]
shape: (n_clusters, n_features_i)The cluster centroids for each of the two views.
centroids_[0]
corresponds to the centroids of view 1 andcentroids_[1]
corresponds to the centroids of view 2.- Type
list of array-likes
Notes
Multi-view spherical k-means clustering adapts the traditional spherical kmeans clustering algorithm to handle two views of data. This algorithm is similar to the mult-view k-means algorithm, except it uses cosine distance instead of euclidean distance for the purposes of computing the optimization objective and making assignments. This algorithm requires that a conditional independence assumption between views holds true. In cases where both views are informative and conditionally independent, multi-view spherical k-means clustering can outperform its single-view analog run on a concatenated version of the two views of data. This is quite useful for applications where you wish to cluster data from two different modalities or data with features that naturally fall into two different partitions. Multi-view spherical k-means works by iteratively performing the maximization and expectation steps of traditional EM in one view, and then using the computed hidden variables as the input for the maximization step in the other view. This algorithm is described in the section for multi-view k-means clustering.
Examples
>>> from mvlearn.datasets import load_UCImultifeature >>> from mvlearn.cluster import MultiviewSphericalKMeans >>> from sklearn.metrics import normalized_mutual_info_score as nmi_score >>> # Get 5-class data >>> data, labels = load_UCImultifeature(select_labeled = list(range(5))) >>> mv_data = data[:2] # first 2 views only >>> mv_kmeans = MultiviewSphericalKMeans(n_clusters=5, random_state=5) >>> mv_clusters = mv_kmeans.fit_predict(mv_data) >>> # Compute nmi between true class labels and multi-view cluster labels >>> nmi = nmi_score(labels, mv_clusters) >>> print('{0:.3f}'.format(nmi)) 0.823