The pls() function offers some very basic approaches for
handling missing values in the data, specified via the
missing argument. Currently, there are three options.
missing = "listwise")missing = "mean")missing = "kNN")The last two options are single imputation approaches. The
pls() function does not currently offer any multiple
imputation approaches, but we show how this can be done by the user
itself, using the mice package, at the end of the
vignette.
With missing="listwise" (the default) any observation
(i.e., a row) containing missing values for the variables used in the
model are removed. Here we can see an example.
model <- "Survived ~ Age + Female + Age:Female"
fit <- pls(model, data = titanic, missing = "listwise", ordered = "Survived")
#> plssem->getPLS_Data():
#> Removing missing data using listwise deletion...With missing="mean" missing values are imputed with
(univariate) expected values. For continous values missing values are
imputed using the mean. For ordinal variables with more than two
categories, missing values are imputed with the median. For binary
ordered variables missing values are imputed with the mode.
In our example, missing values in Age are imputed with
the mean of age. Both Survived and Female are
binary variables, where the missing values get imputed with the most
common value.
model <- "Survived ~ Age + Female + Age:Female"
fit <- pls(model, data = titanic, missing = "mean", ordered = "Survived")
#> plssem->getPLS_Data():
#> Imputing missing data using mean imputation...With missing="kNN" missing values are imputed by finding
the k nearest (complete data) neighbors of an observation with missing
data. The values of the values of the k neighbors are then aggregated
using either the mean, median or the mode, depending on the data type of
the variable. The k number of neighbors to be used, can be specified
using the knn.k argument.
model <- "Survived ~ Age + Female + Age:Female"
fit <- pls(model, data = titanic, missing = "kNN",
ordered = "Survived", knn.k = 5) # use the 5 nearest neighbors
#> plssem->getPLS_Data():
#> Imputing missing data using k-Nearest Neighbors (kNN), k = 5...
#> Warning: plssem->mcpls():
#> Base fit is inadmissible! The MC-PLS algorithm might not converge to a
#> proper solution!Multiple imputation cannot be performed just using the
pls() function, but it can be performed using other
available multiple imputation packages in R. Here we use
the mice package, but other packages can be used as well
(e.g., the Amelia package).
library(mice)
#>
#> Attaching package: 'mice'
#> The following object is masked from 'package:stats':
#>
#> filter
#> The following objects are masked from 'package:base':
#>
#> cbind, rbind
m <- 20 # Number of imputations
vars <- c("Survived", "Age", "Female") # Variables to impute/use in the analysis
imputations <- mice(titanic[vars], m = m)
#>
#> iter imp variable
#> 1 1 Survived Age
#> 1 2 Survived Age
#> 1 3 Survived Age
#> 1 4 Survived Age
#> 1 5 Survived Age
#> 1 6 Survived Age
#> 1 7 Survived Age
#> 1 8 Survived Age
#> 1 9 Survived Age
#> 1 10 Survived Age
#> 1 11 Survived Age
#> 1 12 Survived Age
#> 1 13 Survived Age
#> 1 14 Survived Age
#> 1 15 Survived Age
#> 1 16 Survived Age
#> 1 17 Survived Age
#> 1 18 Survived Age
#> 1 19 Survived Age
#> 1 20 Survived Age
#> 2 1 Survived Age
#> 2 2 Survived Age
#> 2 3 Survived Age
#> 2 4 Survived Age
#> 2 5 Survived Age
#> 2 6 Survived Age
#> 2 7 Survived Age
#> 2 8 Survived Age
#> 2 9 Survived Age
#> 2 10 Survived Age
#> 2 11 Survived Age
#> 2 12 Survived Age
#> 2 13 Survived Age
#> 2 14 Survived Age
#> 2 15 Survived Age
#> 2 16 Survived Age
#> 2 17 Survived Age
#> 2 18 Survived Age
#> 2 19 Survived Age
#> 2 20 Survived Age
#> 3 1 Survived Age
#> 3 2 Survived Age
#> 3 3 Survived Age
#> 3 4 Survived Age
#> 3 5 Survived Age
#> 3 6 Survived Age
#> 3 7 Survived Age
#> 3 8 Survived Age
#> 3 9 Survived Age
#> 3 10 Survived Age
#> 3 11 Survived Age
#> 3 12 Survived Age
#> 3 13 Survived Age
#> 3 14 Survived Age
#> 3 15 Survived Age
#> 3 16 Survived Age
#> 3 17 Survived Age
#> 3 18 Survived Age
#> 3 19 Survived Age
#> 3 20 Survived Age
#> 4 1 Survived Age
#> 4 2 Survived Age
#> 4 3 Survived Age
#> 4 4 Survived Age
#> 4 5 Survived Age
#> 4 6 Survived Age
#> 4 7 Survived Age
#> 4 8 Survived Age
#> 4 9 Survived Age
#> 4 10 Survived Age
#> 4 11 Survived Age
#> 4 12 Survived Age
#> 4 13 Survived Age
#> 4 14 Survived Age
#> 4 15 Survived Age
#> 4 16 Survived Age
#> 4 17 Survived Age
#> 4 18 Survived Age
#> 4 19 Survived Age
#> 4 20 Survived Age
#> 5 1 Survived Age
#> 5 2 Survived Age
#> 5 3 Survived Age
#> 5 4 Survived Age
#> 5 5 Survived Age
#> 5 6 Survived Age
#> 5 7 Survived Age
#> 5 8 Survived Age
#> 5 9 Survived Age
#> 5 10 Survived Age
#> 5 11 Survived Age
#> 5 12 Survived Age
#> 5 13 Survived Age
#> 5 14 Survived Age
#> 5 15 Survived Age
#> 5 16 Survived Age
#> 5 17 Survived Age
#> 5 18 Survived Age
#> 5 19 Survived Age
#> 5 20 Survived Age
COEF <- NULL # Matrix with estimated coefficients for each imputation
BOOT <- NULL # Matrix with all the bootstraps from all imputations
model <- "Survived ~ Age + Female + Age:Female"
for (i in seq_len(m)) {
fit.i <- pls(model, data = complete(imputations, i), # get the ith imputation
ordered = "Survived",
bootstrap = TRUE,
boot.R = 100,
boot.parallel = "multicore", # Use parallel bootstrap
boot.ncores = 2L)
COEF <- rbind(COEF, coef(fit.i))
BOOT <- rbind(BOOT, boot(fit.i))
}
#> Warning: plssem->mcpls():
#> Base fit is inadmissible! The MC-PLS algorithm might not converge to a
#> proper solution!
#> Warning: plssem->bootstrap():
#> Kept 23 (out of 100) bootstrap replicate(s) with inadmissible solutions.
#> Warning: plssem->mcpls():
#> Base fit is inadmissible! The MC-PLS algorithm might not converge to a
#> proper solution!
#> Warning: plssem->bootstrap():
#> Kept 25 (out of 100) bootstrap replicate(s) with inadmissible solutions.
#> Warning: plssem->bootstrap():
#> Kept 13 (out of 100) bootstrap replicate(s) with inadmissible solutions.
#> Warning: plssem->bootstrap():
#> Kept 34 (out of 100) bootstrap replicate(s) with inadmissible solutions.
#> Warning: plssem->mcpls():
#> Base fit is inadmissible! The MC-PLS algorithm might not converge to a
#> proper solution!
#> Warning: plssem->bootstrap():
#> Kept 25 (out of 100) bootstrap replicate(s) with inadmissible solutions.
#> Warning: plssem->mcpls():
#> Base fit is inadmissible! The MC-PLS algorithm might not converge to a
#> proper solution!
#> Warning: plssem->bootstrap():
#> Kept 38 (out of 100) bootstrap replicate(s) with inadmissible solutions.
#> Warning: plssem->mcpls():
#> Base fit is inadmissible! The MC-PLS algorithm might not converge to a
#> proper solution!
#> Warning: plssem->bootstrap():
#> Kept 13 (out of 100) bootstrap replicate(s) with inadmissible solutions.
#> Warning: plssem->mcpls():
#> Base fit is inadmissible! The MC-PLS algorithm might not converge to a
#> proper solution!
#> Warning: plssem->bootstrap():
#> Kept 13 (out of 100) bootstrap replicate(s) with inadmissible solutions.
#> Warning: plssem->bootstrap():
#> Kept 25 (out of 100) bootstrap replicate(s) with inadmissible solutions.
#> Warning: plssem->mcpls():
#> Base fit is inadmissible! The MC-PLS algorithm might not converge to a
#> proper solution!
#> Warning: plssem->bootstrap():
#> Kept 32 (out of 100) bootstrap replicate(s) with inadmissible solutions.
#> Warning: plssem->mcpls():
#> Base fit is inadmissible! The MC-PLS algorithm might not converge to a
#> proper solution!
#> Warning: plssem->bootstrap():
#> Kept 15 (out of 100) bootstrap replicate(s) with inadmissible solutions.
#> Warning: plssem->mcpls():
#> Base fit is inadmissible! The MC-PLS algorithm might not converge to a
#> proper solution!
#> Warning: plssem->bootstrap():
#> Kept 27 (out of 100) bootstrap replicate(s) with inadmissible solutions.
#> Warning: plssem->bootstrap():
#> Kept 24 (out of 100) bootstrap replicate(s) with inadmissible solutions.
#> Warning: plssem->bootstrap():
#> Kept 8 (out of 100) bootstrap replicate(s) with inadmissible solutions.
#> Warning: plssem->mcpls():
#> Base fit is inadmissible! The MC-PLS algorithm might not converge to a
#> proper solution!
#> Warning: plssem->bootstrap():
#> Kept 25 (out of 100) bootstrap replicate(s) with inadmissible solutions.
#> Warning: plssem->bootstrap():
#> Kept 13 (out of 100) bootstrap replicate(s) with inadmissible solutions.
#> Warning: plssem->bootstrap():
#> Kept 4 (out of 100) bootstrap replicate(s) with inadmissible solutions.
#> Warning: plssem->bootstrap():
#> Kept 16 (out of 100) bootstrap replicate(s) with inadmissible solutions.
#> Warning: plssem->mcpls():
#> Base fit is inadmissible! The MC-PLS algorithm might not converge to a
#> proper solution!
#> Warning: plssem->bootstrap():
#> Kept 25 (out of 100) bootstrap replicate(s) with inadmissible solutions.
#> Warning: plssem->mcpls():
#> Base fit is inadmissible! The MC-PLS algorithm might not converge to a
#> proper solution!
#> Warning: plssem->bootstrap():
#> Kept 25 (out of 100) bootstrap replicate(s) with inadmissible solutions.
round(apply(COEF, MARGIN = 2, FUN = mean), 3) # Mean estimate across imputations
#> Survived<~Survived Age<~Age Female<~Female
#> 1.000 1.000 1.000
#> Survived~Age Survived~Female Survived~Age:Female
#> -0.074 0.655 0.209
#> Survived~~Survived Age~~Age Age~~Female
#> 0.490 1.000 -0.056
#> Age~~Age:Female Female~~Female Female~~Age:Female
#> -0.001 1.000 0.002
#> Age:Female~~Age:Female Survived|t1
#> 1.003 0.221
round(apply(BOOT, MARGIN = 2, FUN = sd), 3) # Standard errors
#> Survived<~Survived Age<~Age Female<~Female
#> 0.000 0.000 0.000
#> Survived~Age Survived~Female Survived~Age:Female
#> 0.052 0.075 0.056
#> Survived~~Survived Age~~Age Age~~Female
#> 0.074 0.040 0.044
#> Age~~Age:Female Female~~Female Female~~Age:Female
#> 0.067 0.016 0.038
#> Age:Female~~Age:Female Survived|t1
#> 0.055 0.142