--- title: "An Introduction to `multiview`" author: - Daisy Yi Ding - Shuangning Li - Balasubramanian Narasimhan - Robert Tibshirani date: "`r format(Sys.time(), '%B %d, %Y')`" output: pdf_document: toc: yes toc_depth: '3' html_document: toc: yes toc_depth: '3' df_print: paged link-citations: yes bibliography: assets/coop_refs.bib vignette: > %\VignetteIndexEntry{An Introduction to multiview} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r include=FALSE} # the code in this chunk enables us to truncate the print output for each # chunk using the `out.lines` option # save the built-in output hook hook_output <- knitr::knit_hooks$get("output") # set a new output hook to truncate text output knitr::knit_hooks$set(output = function(x, options) { if (!is.null(n <- options$out.lines)) { x <- xfun::split_lines(x) if (length(x) > n) { # truncate the output x <- c(head(x, n), "....\n") } x <- paste(x, collapse = "\n") } hook_output(x, options) }) ``` ## Introduction `multiview` is a package that fits a supervised learning model called _cooperative learning_ for multiple sets of features ("views"), as described in @cooperative. The method combines the usual squared error loss of predictions, or more generally, deviance loss, with an "agreement" penalty to encourage the predictions from different data views to agree. By varying the weight of the agreement penalty, we get a continuum of solutions that include the well-known early and late fusion approaches. Cooperative learning chooses the degree of agreement (or fusion) in an adaptive manner, using a validation set or cross-validation to estimate test set prediction error. In addition, the method combines the lasso penalty with the agreement penalty, yielding feature sparsity. This vignette describes the basic usage of multiview in R. `multiview` is written based on the `glmnet` package and maintains the features from `glmnet`. The package includes functions for cross-validation, and making predictions and plots. For two data views, consider feature matrices $X \in \mathcal R^{n\times p_x}$, $Z \in \mathcal R^{n\times p_z}$, and our target $y \in \mathcal R^{n}$. We assume that the columns of $X$ and $Z$ have been standardized, and $y$ has mean 0 (hence we can omit the intercept below). For a fixed value of the hyperparameter $\rho\geq 0$, `multiview` finds $\beta_x \in \mathcal R^{p_x}$ and $\beta_Z \in \mathcal R^{p_z}$ that minimize: $$\frac{1}{2} ||y-X\theta_x- Z\theta_z||^2+ \frac{\rho}{2}||(X\theta_x- Z\theta_z)||^2+ \lambda \Bigl[(1-\alpha)(||\theta_x||_1+||\theta_z||_1)+ \alpha(||\theta_x||_2^2/2+||\theta_z||_2^2/2)\Bigr]. $$ Here the _agreement_ penalty is controlled by $\rho$. When the weight on the agreement term $\rho$ is set to 0, cooperative learning reduces to a form of early fusion: we simply concatenate the columns of different views and apply lasso or another regularized regression method. Moreover, we show in our paper that when $\rho$ is set to 1, the solutions are the average of the marginal fits for $X$ and $Z$, which is a simple form of late fusion. The _elastic net_ penalty is controlled by $\alpha$, bridging the gap between lasso regression ($\alpha=1$, the default) and ridge regression ($\alpha=0$), and the tuning parameter $\lambda$ controls the strength of the penalty. We can compute a regularization path of solutions indexed by $\lambda$. For more than two views, this generalizes easily. Assume that we have `M` data matrices $X_1 \in \mathcal R^{n\times p_1},X_2\in \mathcal R^{n\times p_2},\ldots, X_M\in \mathcal R^{n\times p_M}$, `multiview` solves the problem $$\frac{1}{2} ||y- \sum_{m=1}^{M} X_m\rho_m||^2+ \frac{\rho}{2}\sum_{m