Embedding¶

Canonical Correlation Analysis (CCA)¶

class mvlearn.embed.CCA(n_components=1, regs=None, signal_ranks=None, center=True, i_mcca_method='auto', multiview_output=True)[source]¶

Canonical Correlation Analysis (CCA)

CCA inherits from MultiCCA (MCCA) but is restricted to 2 views which allows for certain statistics to be computed about the results.

Parameters

n_components (int (default 1)) -- Number of canonical components to compute and return.
regs (float | 'lw' | 'oas' | None, or list, optional (default None)) --
CCA regularization for each data view, which can be important for high dimensional data. A list will specify for each view separately. If float, must be between 0 and 1 (inclusive).
- 0 or None : corresponds to SUMCORR-AVGVAR MCCA.
- 1 : partial least squares SVD (generalizes to more than 2 views)
- 'lw' : Default sklearn.covariance.ledoit_wolf regularization
- 'oas' : Default sklearn.covariance.oas regularization
signal_ranks (int, None or list, optional (default None)) -- The initial signal rank to compute. If None, will compute the full SVD. A list will specify for each view separately.
center (bool, or list (default True)) -- Whether or not to initially mean center the data. A list will specify for each view separately.
i_mcca_method ('auto' | 'svd' | 'gevp' (default 'auto')) -- Whether or not to use the SVD based method (only works with no regularization) or the gevp based method for informative MCCA.
multiview_output (bool, optional (default True)) -- If True, the .transform method returns one dataset per view. Otherwise, it returns one dataset, of shape (n_samples, n_components)

means_¶

The means of each view, each of shape (n_features,)

Type: list of numpy.ndarray

loadings_¶

The loadings for each view used to project new data, each of shape (n_features_b, n_components).

Type: list of numpy.ndarray

common_score_norms_¶

Column norms of the sum of the fitted view scores. Used for projecting new data

Type: numpy.ndarray, shape (n_components,)

evals_¶

The generalized eigenvalue problem eigenvalues.

Type: numpy.ndarray, shape (n_components,)

n_views_¶

The number of views

Type: int

n_features_¶

The number of features in each fitted view

Type: list

n_components_¶

The number of components in each transformed view

Type: int

Multiview Canonical Correlation Analysis (MCCA)¶

class mvlearn.embed.MCCA(n_components=1, regs=None, signal_ranks=None, center=True, i_mcca_method='auto', multiview_output=True)[source]¶

Multiview canonical correlation analysis for any number of views. Includes options for regularized MCCA and informative MCCA (where a low rank PCA is first computed).

Parameters

n_components (int | 'min' | 'max' | None (default 1)) -- Number of final components to compute. If int, will compute that many. If None, will compute as many as possible. 'min' and 'max' will respectively use the minimum/maximum number of features among views.
regs (float | 'lw' | 'oas' | None, or list, optional (default None)) --
MCCA regularization for each data view, which can be important for high dimensional data. A list will specify for each view separately. If float, must be between 0 and 1 (inclusive).
- 0 or None : corresponds to SUMCORR-AVGVAR MCCA.
- 1 : partial least squares SVD (generalizes to more than 2 views)
- 'lw' : Default sklearn.covariance.ledoit_wolf regularization
- 'oas' : Default sklearn.covariance.oas regularization
signal_ranks (int, None or list, optional (default None)) -- The initial signal rank to compute. If None, will compute the full SVD. A list will specify for each view separately.
center (bool, or list (default True)) -- Whether or not to initially mean center the data. A list will specify for each view separately.
i_mcca_method ('auto' | 'svd' | 'gevp' (default 'auto')) -- Whether or not to use the SVD based method (only works with no regularization) or the gevp based method for informative MCCA.
multiview_output (bool, optional (default True)) -- If True, the .transform method returns one dataset per view. Otherwise, it returns one dataset, of shape (n_samples, n_components)

means_¶

The means of each view, each of shape (n_features,)

Type: list of numpy.ndarray

loadings_¶

The loadings for each view used to project new data, each of shape (n_features_b, n_components).

Type: list of numpy.ndarray

common_score_norms_¶

Column norms of the sum of the fitted view scores. Used for projecting new data

Type: numpy.ndarray, shape (n_components,)

evals_¶

The generalized eigenvalue problem eigenvalues.

Type: numpy.ndarray, shape (n_components,)

n_views_¶

The number of views

Type: int

n_features_¶

The number of features in each fitted view

Type: list

n_components_¶

The number of components in each transformed view

Type: int

Kernel MCCA¶

class mvlearn.embed.KMCCA(n_components=1, kernel='linear', kernel_params={}, regs=None, signal_ranks=None, sval_thresh=0.001, diag_mode='A', center=True, filter_params=False, n_jobs=None, multiview_output=True, pgso=False, tol=0.1)[source]¶

Kernel multi-view canonical correlation analysis.

Parameters

n_components (int, None (default 1)) -- Number of components to compute. If None, will use the number of features.
kernel (str, callable, or list (default 'linear')) -- The kernel function to use. This is the metric argument to sklearn.metrics.pairwise.pairwise_kernels. A list will specify for each view separately.
kernel_params (dict, or list (default {})) -- Key word arguments to sklearn.metrics.pairwise.pairwise_kernels. A list will specify for each view separately.
regs (float, None, or list, optional (default None)) -- None equates to 0. Floats are nonnegative. The value is used to regularize singular values in each view based on diag_mode A list will specify the method for each view separately.
signal_ranks (int, None, or list, optional (default None)) -- Largest SVD rank to compute for each view. If None, the full rank decomposition will be used. A list will specify for each view separately.
sval_thresh (float, or list (default 1e-3)) -- For each view we throw out singular values of (1/n)K, the gram matrix scaled by n_samples, below this threshold. A non-zero value deals with singular gram matrices.
diag_mode ('A' | 'B' | 'C' (default 'A')) --
Method of regularizing singular values s with regularization parameter r
- 'A' : \((1 - r) * K^2 + r * K\) 5
- 'B' : \((1-r) (K + n/2 \kappa * I)^2\) where \(\kappa = r / (1 - r)\) 6
- 'C' : \((1 - r) K^2 + r * I_n\) 7
center (bool, or list (default True)) -- Whether or not to initially mean center the data. A list will specify for each view separately.
filter_params (bool (default False)) -- See sklearn.metrics.pairwise.pairwise_kernels documentation.
n_jobs (int, None, optional (default None)) -- Number of jobs to run in parallel when computing kernel matrices. See sklearn.metrics.pairwise.pairwise_kernels documentation.
multiview_output (bool, optional (default True)) -- If True, the .transform method returns one dataset per view. Otherwise, it returns one dataset, of shape (n_samples, n_components)
pgso (bool, optional (default False)) -- If True, computes a partial Gram-Schmidt orthogonalization approximation of the kernel matrices to the given tolerance.
tol (float (default 0.1)) -- The minimum matrix trace difference between a kernel matrix and its computed pgso approximation, relative to the kernel trace.

kernel_col_means_¶

The column means of each gram matrix

Type: list of numpy.ndarray, shape (n_samples,)

kernel_mat_means_¶

The total means of each gram matrix

Type: list

dual_vars_¶

The loadings for the gram matrix of each view

Type: numpy.ndarray, shape (n_views, n_samples, n_components)

common_score_norms_¶

Column norms of the sum of the view scores. Useful for projecting new data

Type: numpy.ndarray, shape (n_components,)

evals_¶

The generalized eigenvalue problem eigenvalues.

Type: numpy.ndarray, shape (n_components,)

Xs_¶

Xs[i] shape (n_samples, n_features_i)

The original data matrices for use in kernel matrix computation during calls to .transform.

Type: list of numpy.ndarray, length (n_views,)

n_views_¶

The number of views

Type: int

n_features_¶

The number of features in each fitted view

Type: list

n_components_¶

The number of components in each transformed view

Type: int

pgso_Ls_¶

pgso_Ls_[i] shape (n_samples, rank_i)

The Gram-Schmidt approximations of the kernel matrices

Type: list of numpy.ndarray, length (n_views,)

pgso_norms_¶

pgso_norms_[i] shape (rank_i,)

The maximum norms found during the Gram-Schmidt procedure, descending

Type: list of numpy.ndarray, length (n_views,)

pgso_idxs_¶

pgso_idxs_[i] shape (rank_i,)

The sample indices of the maximum norms

Type: list of numpy.ndarray, length (n_views,)

pgso_Xs_¶

pgso_Xs_[i] shape (rank_i, n_features)

The samples with indices saved in pgso_idxs_, sorted by pgso_norms_

Type: list of numpy.ndarray, length (n_views,)

pgso_ranks_¶

The ranks of the partial Gram-Schmidt results for each view.

Type: list, length (n_views,)

Notes

Traditional CCA aims to find useful projections of features in each view of data, computing a weighted sum, but may not extract useful descriptors of the data because of its linearity. KMCCA offers an alternative solution by first projecting the data onto a higher dimensional feature space.

\[\phi: \mathbf{x} = (x_1,...,x_m) \mapsto \phi(\mathbf{x}) = (z_1,...,z_N), (m << N)\]

before performing CCA in the new feature space.

Kernels are effectively distance functions that compute inner products in the higher dimensional feature space, a method known as the kernel trick. A kernel function K, such that for all \(\mathbf{x}, \mathbf{z} \in X\)

\[K(\mathbf{x}, \mathbf{z}) = \langle\phi(\mathbf{x}) \cdot \phi(\mathbf{z})\rangle.\]

The kernel matrix \(K_i\) has entries computed from the kernel function. Using the kernel trick, loadings of the kernel matrix (dual_vars_) are solved for rather than of the features from \(\phi\).

Kernel matrices grow exponentially with the size of data. They not only have to store \(n^2\) elements, but also face the complexity of matrix eigenvalue problems. In a Cholesky decomposition a positive definite matrix K is decomposed to a lower triangular matrix \(L\) : \(K = LL'\).

The dual partial Gram-Schmidt orthogonalization (PSGO) is equivalent to the Incomplete Cholesky Decomposition (ICD) which looks for a low rank approximation of \(L\), reducing the cost of operations of the matrix such that \(\frac{1}{\sum_i K_{ii}} tr(K - LL^T) \leq tol\).

A PSGO tolerance yielding rank \(m\) leads to storage requirements of \(O(mn)\) instead of \(O(n^2)\) and becomes \(O(nm^2)\) instead of \(O(n^3)\) 7.

Generalized Canonical Correlation Analysis (GCCA)¶

class mvlearn.embed.GCCA(n_components=None, fraction_var=None, sv_tolerance=None, n_elbows=2, tall=False, max_rank=False, n_jobs=None)[source]¶

An implementation of Generalized Canonical Correlation Analysis 8 suitable for cases where the number of features exceeds the number of samples by first applying single view dimensionality reduction. Computes individual projections into a common subspace such that the correlations between pairwise projections are minimized (ie. maximize pairwise correlation). An important note: this is applicable to any number of views, not just two.

Parameters

n_components (int (positive), optional, default=None) -- If self.sv_tolerance=None, selects the number of SVD components to keep for each view. If none, another selection method is used.
fraction_var (float, default=None) -- If self.sv_tolerance=None, and self.n_components=None, selects the number of SVD components to keep for each view by capturing enough of the variance. If none, another selection method is used.
sv_tolerance (float, optional, default=None) -- Selects the number of SVD components to keep for each view by thresholding singular values. If none, another selection method is used.
n_elbows (int, optional, default: 2) -- If self.fraction_var=None, self.sv_tolerance=None, and self.n_components=None, then compute the optimal embedding dimension using utils.select_dimension(). Otherwise, ignored.
tall (boolean, default=False) -- Set to true if n_samples > n_features, speeds up SVD
max_rank (boolean, default=False) -- If true, sets the rank of the common latent space as the maximum rank of the individual spaces. If false, uses the minimum individual rank.
n_jobs (int (positive), default=None) -- The number of jobs to run in parallel when computing the SVDs for each view in fit and partial_fit. None means 1 job, -1 means using all processors.

projection_mats_¶

A projection matrix for each view, from the given space to the latent space

Type: list of arrays

ranks_¶

Number of left singular vectors kept for each view during the first SVD

Type: list of ints

Notes

Consider two views \(X_1\) and \(X_2\). Canonical Correlation Analysis seeks to find vectors \(a_1\) and \(a_2\) to maximize the correlation \(X_1 a_1\) and \(X_2 a_2\), expanded below.

\[\left(\frac{a_1^TC_{12}a_2} {\sqrt{a_1^TC_{11}a_1a_2^TC_{22}a_2}} \right)\]

where \(C_{11}\), \(C_{22}\), and \(C_{12}\) are respectively the view 1, view 2, and between view covariance matrix estimates. GCCA maximizes the sum of these correlations across all pairwise views and computes a set of linearly independent components. This specific algorithm first applies principal component analysis (PCA) independently to each view and then aligns the most informative projections to find correlated and informative subspaces. Parameters that control the embedding dimension apply to the PCA step. The dimension of each aligned subspace is the maximum or minimum of the individual dimensions, per the max_ranks parameter. Using the maximum will capture the most information from all views but also noise from some views. Using the minimum will better remove noise dimensions but at the cost of information from some views.

References

8: B. Afshin-Pour, G.A. Hossein-Zadeh, S.C. Strother, H. Soltanian-Zadeh. "Enhancing reproducibility of fMRI statistical maps using generalized canonical correlation analysis in NPAIRS framework." Neuroimage, volume 60, pp. 1970-1981, 2012

Examples

>>> from mvlearn.datasets import load_UCImultifeature
>>> from mvlearn.embed import GCCA
>>> # Load full dataset, labels not needed
>>> Xs, _ = load_UCImultifeature()
>>> gcca = GCCA(fraction_var = 0.9)
>>> # Transform the first 5 views
>>> Xs_latents = gcca.fit_transform(Xs[:5])
>>> print([X.shape[1] for X in Xs_latents])
[9, 9, 9, 9, 9]

Deep Canonical Correlation Analysis (DCCA)¶

class mvlearn.embed.DCCA(input_size1=None, input_size2=None, n_components=2, layer_sizes1=None, layer_sizes2=None, use_all_singular_values=False, device=device(type='cpu'), epoch_num=200, batch_size=800, learning_rate=0.001, reg_par=1e-05, tolerance=0.001, print_train_log_info=False)[source]¶

An implementation of Deep Canonical Correlation Analysis 9 with PyTorch. It computes projections into a common subspace in order to maximize the correlation between pairwise projections into the subspace from two views of data. To obtain these projections, two fully connected deep networks are trained to initially transform the two views of data. Then, the transformed data is projected using linear CCA. This can be thought of as training a kernel for each view that initially acts on the data before projection. The networks are trained to maximize the ability of the linear CCA to maximize the correlation between the final dimensions.

Parameters

input_size1 (int (positive)) -- The dimensionality of the input vectors in view 1.
input_size2 (int (positive)) -- The dimensionality of the input vectors in view 2.
n_components (int (positive), default=2) -- The output dimensionality of the correlated projections. The deep network wil transform the data to this size. Must satisfy: n_components <= max(layer_sizes1[-1], layer_sizes2[-1]).
layer_sizes1 (list of ints, default=None) -- The sizes of the layers of the deep network applied to view 1 before CCA. For example, if the input dimensionality is 256, and there is one hidden layer with 1024 units and the output dimensionality is 100 before applying CCA, layer_sizes1=[1024, 100]. If None, set to [1000, self.n_components_].
layer_sizes2 (list of ints, default=None) -- The sizes of the layers of the deep network applied to view 2 before CCA. Does not need to have the same hidden layer architecture as layer_sizes1, but the final dimensionality must be the same. If None, set to [1000, self.n_components_].
use_all_singular_values (boolean (default=False)) -- Whether or not to use all the singular values in the CCA computation to calculate the loss. If False, only the top n_components singular values are used.
device (string, default='cpu') -- The torch device for processing. Can be used with a GPU if available.
epoch_num (int (positive), default=200) -- The max number of epochs to train the deep networks.
batch_size (int (positive), default=800) -- Batch size for training the deep networks.
learning_rate (float (positive), default=1e-3) -- Learning rate for training the deep networks.
reg_par (float (positive), default=1e-5) -- Weight decay parameter used in the RMSprop optimizer.
tolerance (float, (positive), default=1e-2) -- Threshold difference between successive iteration losses to define convergence and stop training.
print_train_log_info (boolean, default=False) -- If True, the training loss at each epoch will be printed to the console when DCCA.fit() is called.

input_size1_¶

The dimensionality of the input vectors in view 1.

Type: int (positive)

input_size2_¶

The dimensionality of the input vectors in view 2.

Type: int (positive)

n_components_¶

The output dimensionality of the correlated projections. The deep network wil transform the data to this size. If not specified, will be set to 2.

Type: int (positive)

layer_sizes1_¶

The sizes of the layers of the deep network applied to view 1 before CCA. For example, if the input dimensionality is 256, and there is one hidden layer with 1024 units and the output dimensionality is 100 before applying CCA, layer_sizes1=[1024, 100].

Type: list of ints

layer_sizes2_¶

The sizes of the layers of the deep network applied to view 2 before CCA. Does not need to have the same hidden layer architecture as layer_sizes1, but the final dimensionality must be the same.

Type: list of ints

device_¶

The torch device for processing.

Type: string

batch_size_¶

Batch size for training the deep networks.

Type: int (positive)

learning_rate_¶

Learning rate for training the deep networks.

Type: float (positive)

reg_par_¶

Weight decay parameter used in the RMSprop optimizer.

Type: float (positive)

deep_model_¶

2 view Deep CCA object used to transform 2 views of data together.

Type: DeepPairedNetworks object

linear_cca_¶

Linear CCA object used to project final transformations from output of deep_model to the n_components.

Type: linear_cca object

model_¶

Wrapper around deep_model to allow parallelisation.

Type: torch.nn.DataParallel object

loss_¶

Loss function for deep_model. Defined as the negative correlation between outputs of transformed views.

Type: cca_loss object

optimizer_¶

Optimizer used to train the networks.

Type: torch.optim.RMSprop object

Raises: ModuleNotFoundError -- In order to run DCCA, pytorch and other certain optional dependencies must be installed. See the installation page for details.

Notes

Deep Canonical Correlation Analysis is a method of finding highly correlated subspaces for 2 views of data using nonlinear transformations learned by deep networks. It can be thought of as using deep networks to learn the best potentially nonlinear kernels for a variant of kernel CCA.

The networks used for each view in DCCA consist of fully connected linear layers with a sigmoid activation function.

The problem DCCA problem is formulated from 9. Consider two views \(X_1\) and \(X_2\). DCCA seeks to find the parameters for each view, \(\Theta_1\) and \(\Theta_2\), such that they maximize

\[\text{corr}\left(f_1\left(X_1;\Theta_1\right), f_2\left(X_2;\Theta_2\right)\right)\]

These parameters are estimated in the deep network by following gradient descent on the input data. Taking \(H_1, H_2 \in R^{o \times m}\) to be the outputs of the deep network in each column for the input data of size \(m\). Take the centered matrix \(\bar{H}_1 = H_1-\frac{1}{m}H_1{1}\), and \(\bar{H}_2 = H_2-\frac{1}{m}H_2{1}\). Then, define

\[\begin{split}\begin{align*} \hat{\Sigma}_{12} &= \frac{1}{m-1}\bar{H}_1\bar{H}_2^T \\ \hat{\Sigma}_{11} &= \frac{1}{m-1}\bar{H}_1\bar{H}_1^T + r_1I \\ \hat{\Sigma}_{22} &= \frac{1}{m-1}\bar{H}_2\bar{H}_2^T + r_2I \end{align*}\end{split}\]

Where \(r_1\) and \(r_2\) are regularization constants \(>0\) so the matrices are guaranteed to be positive definite.

The correlation objective function is the sum of the top \(k\) singular values of the matrix \(T\), where

\[T = \hat{\Sigma}_{11}^{-1/2}\hat{\Sigma}_{12}\hat{\Sigma}_{22}^{-1/2}\]

Which is the matrix norm of T. Thus, the loss is

\[L(X_1, X2) = -\text{corr}\left(H_1, H_2\right) = -\text{tr}(T^TT)^{1/2}.\]

Examples

>>> from mvlearn.embed import DCCA
>>> import numpy as np
>>> # Exponential data as example of finding good correlation
>>> view1 = np.random.normal(loc=2, size=(1000, 75))
>>> view2 = np.exp(view1)
>>> view1_test = np.random.normal(loc=2, size=(200, 75))
>>> view2_test = np.exp(view1_test)
>>> input_size1, input_size2 = 75, 75
>>> n_components = 2
>>> layer_sizes1 = [1024, 4]
>>> layer_sizes2 = [1024, 4]
>>> dcca = DCCA(input_size1, input_size2, n_components, layer_sizes1,
...             layer_sizes2)
>>> dcca = dcca.fit([view1, view2])
>>> outputs = dcca.transform([view1_test, view2_test])
>>> print(outputs[0].shape)
(200, 2)

References

9(1,2,3): Andrew, G., et al., "Deep canonical correlation analysis." In Proceedings of the 30th International Conference on International Conferenceon Machine Learning, volume 28, pages 1247–1255. JMLR.org, 2013.

Multiview Multidimensional Scaling¶

class mvlearn.embed.MVMDS(n_components=2, num_iter=15, dissimilarity='euclidean')[source]¶

An implementation of Classical Multiview Multidimensional Scaling for jointly reducing the dimensions of multiple views of data 10. A Euclidean distance matrix is created for each view, double centered, and the k largest common eigenvectors between the matrices are found based on the stepwise estimation of common principal components. Using these common principal components, the views are jointly reduced and a single view of k-dimensions is returned.

MVMDS is often a better alternative to PCA for multi-view data. See the tutorials in the documentation.

Parameters

n_components (int (positive), default=2) -- Represents the number of components that the user would like to be returned from the algorithm. This value must be greater than 0 and less than the number of samples within each view.
num_iter (int (positive), default=15) -- Number of iterations stepwise estimation goes through.
dissimilarity ({'euclidean', 'precomputed'}, default='euclidean') --
Dissimilarity measure to use:

'euclidean': Pairwise Euclidean distances between points in the dataset.

'precomputed': Xs is treated as pre-computed dissimilarity matrices.

components_¶

Joint transformed MVMDS components of the input views.

Type: numpy.ndarray, shape(n_samples, n_components)

Notes

Classical Multiview Multidimensional Scaling can be broken down into two steps. The first step involves calculating the Euclidean Distance matrices, \(Z_i\), for each of the \(k\) views and double-centering these matrices through the following calculations:

\[\Sigma_{i}=-\frac{1}{2}J_iZ_iJ_i\]

\[\text{where }J_i=I_i-{\frac {1}{n}}\mathbb{1}\mathbb{1}^T\]

The second step involves finding the common principal components of the \(\Sigma\) matrices. These can be thought of as multiview generalizations of the principal components found in principal component analysis (PCA) given several covariance matrices. The central hypothesis of the common principal component model states that given k normal populations (views), their \(p\) x \(p\) covariance matrices \(\Sigma_{i}\), for \(i = 1,2,...,k\) are simultaneously diagonalizable as:

\[\Sigma_{i} = QD_i^2Q^T\]

where \(Q\) is the common \(p\) x \(p\) orthogonal matrix and \(D_i^2\) are positive \(p\) x \(p\) diagonal matrices. The \(Q\) matrix contains all the common principal components. The common principal component, \(q_j\), is found by solving the minimization problem:

\[\text{Minimize} \sum_{i=1}^{k}n_ilog(q_j^TS_iq_j)\]

\[\text{Subject to } q_j^Tq_j = 1\]

where \(n_i\) represent the degrees of freedom and \(S_i\) represent sample covariance matrices.

This class does not support MVMDS.transform() due to the iterative nature of the algorithm and the fact that the transformation is done during iterative fitting. Use MVMDS.fit_transform() to do both fitting and transforming at once.

Examples

>>> from mvlearn.embed import MVMDS
>>> from mvlearn.datasets import load_UCImultifeature
>>> Xs, _ = load_UCImultifeature()
>>> print(len(Xs)) # number of samples in each view
6
>>> print(Xs[0].shape) # number of samples in each view
(2000, 76)
>>> mvmds = MVMDS(n_components=5)
>>> Xs_reduced = mvmds.fit_transform(Xs)
>>> print(Xs_reduced.shape)
(2000, 5)

References

10: Trendafilov, Nickolay T. “Stepwise Estimation of Common Principal Components.” Computational Statistics & Data Analysis, 54(12):3446–3457, 2010
11: Kanaan-Izquierdo, Samir, et al. "Multiview: a software package for multiview pattern recognition methods." Bioinformatics, 35(16):2877–2879, 2019

Split Autoencoder¶

class mvlearn.embed.SplitAE(hidden_size=64, num_hidden_layers=2, embed_size=20, training_epochs=10, batch_size=16, learning_rate=0.001, print_info=False, print_graph=True)[source]¶

Implements an autoencoder that creates an embedding of a view View1 and from that embedding reconstructs View1 and another view View2, as described in 12.

Parameters

hidden_size (int (default=64)) -- number of nodes in the hidden layers
num_hidden_layers (int (default=2)) -- number of hidden layers in each encoder or decoder net
embed_size (int (default=20)) -- size of the bottleneck vector in the autoencoder
training_epochs (int (default=10)) -- how many times the network trains on the full dataset
batch_size (int (default=16):) -- batch size while training the network
learning_rate (float (default=0.001)) -- learning rate of the Adam optimizer
print_info (bool (default=False)) -- whether or not to print errors as the network trains.
print_graph (bool (default=True)) -- whether or not to graph training loss

view1_encoder_¶

the View1 embedding network as a PyTorch module

Type: torch.nn.Module

view1_decoder_¶

the View1 decoding network as a PyTorch module

Type: torch.nn.Module

view2_decoder_¶

the View2 decoding network as a PyTorch module

Type: torch.nn.Module

Raises: ModuleNotFoundError -- In order to run SplitAE, pytorch and other certain optional dependencies must be installed. See the installation page for details.

Notes

In this figure \(\textbf{x}\) is View1 and \(\textbf{y}\) is View2

Each encoder / decoder network is a fully connected neural net with paramater count equal to:

\[\left(\text{input_size} + \text{embed_size}\right) \cdot \text{hidden_size} + \sum_{1}^{\text{num_hidden_layers}-1}\text{hidden_size}^2\]

Where \(\text{input_size}\) is the number of features in View1 or View2.

The loss that is reduced via gradient descent is:

\[J = \left(p(f(\textbf{x})) - \textbf{x}\right)^2 + \left(q(f(\textbf{x})) - \textbf{y}\right)^2\]

Where \(f\) is the encoder, \(p\) and \(q\) are the decoders, \(\textbf{x}\) is View1, and \(\textbf{y}\) is View2.

References

12: Wang, Weiran, et al. "On Deep Multi-View Representation Learning." In Proceedings of the 32nd International Conference on Machine Learning, 37:1083-1092, 2015.

For more extensive examples, see the tutorials for SplitAE in this documentation.

DCCA Utilities¶

class mvlearn.embed.linear_cca[source]¶

Implementation of linear CCA to act on the output of the deep networks in DCCA.

Consider two views \(X_1\) and \(X_2\). Canonical Correlation Analysis seeks to find vectors \(a_1\) and \(a_2\) to maximize the correlation between \(X_1 a_1\) and \(X_2 a_2\).

w_¶

w[i] : nd-array List of the two weight matrices for projecting each view.

Type: list (length=2)

m_¶

m[i] : nd-array List of the means of the data in each view.

Type: list (length=2)

class mvlearn.embed.cca_loss(n_components, use_all_singular_values, device)[source]¶

An implementation of the loss function of linear CCA as introduced in the original paper for DCCA 9. Details of how this loss is computed can be found in the paper or in the documentation for DCCA.

Parameters

n_components (int (positive)) -- The output dimensionality of the CCA transformation.
use_all_singular_values (boolean) -- Whether or not to use all the singular values in the loss calculation. If False, only use the top n_components singular values.
device (torch.device object) -- The torch device being used in DCCA.

n_components_¶

The output dimensionality of the CCA transformation.

Type: int (positive)

use_all_singular_values_¶

Whether or not to use all the singular values in the loss calculation. If False, only use the top n_components singular values.

Type: boolean

device_¶

The torch device being used in DCCA.

Type: torch.device object

class mvlearn.embed.MlpNet(layer_sizes, input_size)[source]¶

Multilayer perceptron implementation for fully connected network. Used by DCCA for the fully transformation of a single view before linear CCA. Extends torch.nn.Module.

Parameters

layer_sizes (list of ints) -- The sizes of the layers of the deep network applied to view 1 before CCA. For example, if the input dimensionality is 256, and there is one hidden layer with 1024 units and the output dimensionality is 100 before applying CCA, layer_sizes1=[1024, 100].
input_size (int (positive)) -- The dimensionality of the input vectors to the deep network.

layers_¶

The layers in the network.

Type: torch.nn.ModuleList object

Initializes internal Module state, shared by both nn.Module and ScriptModule.

class mvlearn.embed.DeepPairedNetworks(layer_sizes1, layer_sizes2, input_size1, input_size2, n_components, use_all_singular_values, device=device(type='cpu'))[source]¶

A pair of deep networks for operating on the two views of data. Consists of two MlpNet objects for transforming 2 views of data in DCCA. Extends torch.nn.Module.

Parameters

layer_sizes1 (list of ints) -- The sizes of the layers of the deep network applied to view 1 before CCA. For example, if the input dimensionality is 256, and there is one hidden layer with 1024 units and the output dimensionality is 100 before applying CCA, layer_sizes1=[1024, 100].
layer_sizes2 (list of ints) -- The sizes of the layers of the deep network applied to view 2 before CCA. Does not need to have the same hidden layer architecture as layer_sizes1, but the final dimensionality must be the same.
input_size1 (int (positive)) -- The dimensionality of the input vectors in view 1.
input_size2 (int (positive)) -- The dimensionality of the input vectors in view 2.
n_components (int (positive), default=2) -- The output dimensionality of the correlated projections. The deep network will transform the data to this size. If not specified, will be set to 2.
use_all_singular_values (boolean (default=False)) -- Whether or not to use all the singular values in the CCA computation to calculate the loss. If False, only the top n_components singular values are used.
device (string, default='cpu') -- The torch device for processing.

model1_¶

Deep network for view 1 transformation.

Type: MlpNet object

model2_¶

Deep network for view 2 transformation.

Type: MlpNet object

loss_¶

Loss function for the 2 view DCCA.

Type: cca_loss object

Initializes internal Module state, shared by both nn.Module and ScriptModule.

Dimension Selection¶

mvlearn.embed.select_dimension(X, n_components=None, n_elbows=2, threshold=None, return_likelihoods=False)[source]¶

Generates profile likelihood from array based on Zhu and Godsie method 14. Elbows correspond to the optimal embedding dimension.

Parameters

X (1d or 2d array-like) -- Input array generate profile likelihoods for. If 1d-array, it should be sorted in decreasing order. If 2d-array, shape should be (n_samples, n_features).
n_components (int, optional, default: None.) -- Number of components to embed. If None, n_components = floor(log2(min(n_samples, n_features))). Ignored if X is 1d-array.
n_elbows (int, optional, default: 2.) -- Number of likelihood elbows to return. Must be > 1.
threshold (float, int, optional, default: None) -- If given, only consider the singular values that are > threshold. Must be >= 0.
return_likelihoods (bool, optional, default: False) -- If True, returns the all likelihoods associated with each elbow.

Returns

elbows (list) -- Elbows indicate subsequent optimal embedding dimensions. Number of elbows may be less than n_elbows if there are not enough singular values.
sing_vals (list) -- The singular values associated with each elbow.
likelihoods (list of array-like) -- Array of likelihoods of the corresponding to each elbow. Only returned if return_likelihoods is True.

References

13: Code from the https://github.com/neurodata/graspy package, reproduced and shared with permission.
14: Zhu, M. and Ghodsi, A., "Automatic dimensionality selection from the scree plot via the use of profile likelihood. Computational Statistics & Data Analysis." 51(2):918-930, 2006