Missing values imputation with missMDA

The best thing to do with missing values is not to have any” (Gertrude Mary Cox)

Unfortunately, missing values are ubiquitous and occur for plenty of reasons. One solution is single imputation which consists in replacing missing entries with plausible values. It leads to a complete dataset that can be analyzed by any statistical methods.

Based on dimensionality reduction methods, the missMDA package successfully imputes large and complex datasets with quantitative variables, categorical variables and mixed variables. Indeed, it imputes data with principal component methods that take into account the similarities between the observations and the relationship between variables. It has proven to be very competitive in terms of quality of the prediction compared to the state of the art methods.

With 3 lines of code, we impute the dataset orange available in missMDA:

library(missMDA)

data(orange)

nbdim <- estim_ncpPCA(orange) # estimate the number of dimensions to impute

res.comp <- imputePCA(orange, ncp = nbdim)

In the same way, imputeMCA imputes datasets with categorical variables and imputeFAMD imputes mixed datasets.

With a completed data, we can pursue our analyses… however we need to be careful and not to forget that the data was incomplete! In a future post, we will see how to visualize the uncertainty on these predicted values.

You can find more information in this JSS paper, on this website, on this tutorial given at useR!2016 at stanford.

You can also watch this playlist on Youtube to practice with R.

You can also enroll in this MOOC.

Advertisements

2 thoughts on “Missing values imputation with missMDA

    • Here are the lines of code:

      data(vnf)
      ## First the number of components has to be chosen
      ## (for the reconstruction step)
      ## nb <- estim_ncpMCA(vnf,ncp.max=5) ## Time-consuming, nb = 4

      ## Imputation
      res.impute <- imputeMCA(vnf, ncp=4)

      The disjunctive data table is completed in res.impute$tab.disj
      and the dataset is completed in res.impute$completeObs

      FH

      Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s