Residual plot

A residual plot shows the difference between the observed response and the fitted response values.

The ideal residual plot, called the null residual plot, shows a random scatter of points forming an approximately constant width band around the identity line.

It is important to check the fit of the model and assumptions – constant variance, normality, and independence of the errors, using the residual plot, along with normal, sequence, and lag plot.

Assumption How to check
Model function is linear The points form a pattern when the model function is incorrect.

You might be able to transform variables or add polynomial and interaction terms to remove the pattern.

Constant variance If the points tend to form an increasing, decreasing or non-constant width band, then the variance is not constant.

You should consider transforming the response variable or incorporating weights into the model. When variance increases as a percentage of the response, you can use a log transform, although you should ensure it does not produce a poorly fitting model.

Even with non-constant variance, the parameter estimates remain unbiased if somewhat inefficient. However, the hypothesis tests and confidence intervals are inaccurate.

Normality Examine the normal plot of the residuals to identify non-normality.

Violation of the normality assumption only becomes an issue with small sample sizes. For large sample sizes, the assumption is less important due to the central limit theorem, and the fact that the F- and t-tests used for hypothesis tests and forming confidence intervals are quite robust to modest departures from normality. 

Independence When the order of the cases in the dataset is the order in which they occurred:

Examine a sequence plot of the residuals against the order to identify any dependency between the residual and time.

Examine a lag-1 plot of each residual against the previous residual to identify a serial correlation, where observations are not independent, and there is a correlation between an observation and the previous observation.

Time-series analysis may be more suitable to model data where serial correlation is present.

For a model with many terms, it can be difficult to identify specific problems using the residual plot. A non-null residual plot indicates that there are problems with the model, but not necessarily what these are.

Related concepts
Related tasks