FIML – Full-information Maximum Likelihood for missing data in Stata (“How to”/Pitfalls)


With missing data “Full-information Maximum Likelihood” (FIML) is an alternative to multiple imputation which requires considerably fewer decisions from a researcher – and fewer “researcher degrees of freedom” are potentially preferred (cf. here).

FIML in Stata

FIML requires the use of “structural equation models” and the “missing at random (MAR)” assumption regarding the missing values. (For an introduction: here).

Stata implements FIML through its SEM suite. FIML requires the maximum likelihood estimation method option:

method(ml) *Normal maximum likelihood

To specify the use of FIML for missing value, you simply need to add “mv” for missing values to the option

method(mlmv) *Full information maximum likelihood estimation


**Load example data**
sysuse auto

**Variable with missing data:**
codebook rep78

**OLS regression** 
regress price rep78 mpg

**Regression using SEM**
sem (price <- rep78 mpg )

      Number of obs = 69 <- 5 missing obs.

**Regression using SEM - Full information maximuum likelihood**
sem (price <- rep78 mpg ), method(mlmv)

      Number of obs = 74 <- Complete observations

Pitfalls with FIML

Always check whether your FIML results give you all observations. FIML sometimes seems not to work, with only complete observations being used and not missing observations being taking into account.

The most common reason for FIML not to work in Stata is missing values coding. For FIML to work all missing values need to be coded as “.” not “.a”, “.b” or worst “999”, “888” a la SPSS, or “NA” a la R.

Always for FIML, recode missing values:

mvdecode _all, mv(333=. \999=. \666=.)

recode VARNAME (.a = .) (.b=.)

Other sources

Excellent slides on Multiple Imputation and FIML in Stata: