Polynomial regression

Polynomial regression determines the polynomial equation to predict a response (Y, sometimes called dependent) variable based on a predictor (X, sometimes called independent) variable. The polynomial order determines the number of inflexions (turns) on the curvi-linear fitted line.

The requirements of the test are:

A dependent response variable and an independent predictor variable, measured on a continuous scale.
Response must have a curvi-linear relationship with the predictor.
Measurement error in the response must be normally distributed and have constant variance, with the predictor free of measurement error.

Arranging the dataset

Data in existing Excel worksheets can be used and should be arranged in a List dataset layout. The dataset must contain at least two continuous scale variables.

When entering new data we recommend using New Dataset to create a new 2 variables dataset ready for data entry.

Using the test

To start the test:

Excel 2007:
Select any cell in the range containing the dataset to analyse, then click Regression on the Analyse-it tab, then click Polynomial.

Excel 97, 2000, 2002 & 2003:
Select any cell in the range containing the dataset to analyse, then click Analyse on the Analyse-it toolbar, click Regression then click Polynomial.

Click Variable X (independent) and select the independent predictor.
Click Variable Y (dependent) and select the dependent response.
Enter Polynomial order, the number of terms in the fitted line equation or the number of inflexions (bends) in the fitted line. Enter a value between 1 (for linear fit) and 6 (for terms including x⁶).
Enter Confidence interval to calculate for the regression coefficients. The level should be entered as a percentage between 50 and 100, without the % sign.
Click OK to run the test.

The report shows the number of observations analysed, and, if applicable, how many missing values were listwise excluded.

R² and adjusted R²statistics summarise the goodness of the polynomial fit to the observations. Both statistics range from 0 to 1, with higher values indicating a better fit, and a value of 1 indicating a perfect fit. Adjusted R² is similar to R² except it accounts for the number of predictors in the model so Adjusted R² statistics from models with a different number of predictors can be compared. R²values cannot be compared between models.

Analysis of variance is used to test the hypothesis that the polynomial fit is a better fit than the mean. The total variance, the variance of the predictor fitted to just the mean, is partitioned into variance explained by the polynomial regression model and residual variance (the difference from the fitted line to the observations). An F- test then compares the partitioned variances to determine if they are significantly different. The F statistic shows the ratio of the variances, and the p- value the probability that the polynomial fit is no better than fitting to the mean. If the p- value is significant then polynomial fit is better than the mean.

The polynomial fit equation

The regression coefficients table shows the polynomial fit coefficients and confidence intervals for each predictor exponent and the intercept. The coefficients together combine to form the equation of the polynomial fit, the equation used to predict the response from the predictor, as follows:

y = a + bx + cx² + dx³ ...

where a is the intercept coefficient (the point where the straight line intersects the Y axis), and b, c, d (and so on...) are the coefficients for the X predictor variable.

IMPORTANT When using the equation to predict values for Y ensure the coefficients are used to at least 4 significant figures. The values are shown to 4 significant figures, but if necessary, the cells contain the coefficients to much higher precision.

The p- value for each regression coefficient expresses the probability of rejecting the null hypothesis, that the term has no effect on the response, is it is in fact true. A significant p- value, or a coefficient confidence interval that doesn't span zero, implies the term has a significant contribution to the response. If higher order, higher power, terms are not significant, the polynomial order could be reduced without adversely affecting the goodness of fit.

METHOD The p- value is calculation using the t- distribution.

Examining the scatter plot

A scatter plot allows visual assessment of the relationship between the response and predictor variables. The plot can show the fit, the confidence interval for the fit, and prediction intervals.

To modify the scatter plot:

If the Polynomial regression dialog box is not visible click Edit on the Analyse-it tab/toolbar.
Click Scatter plot and select with Fit to show the fit, with Fit + CI to show the fit with confidence interval for the fit, with Fit + PI to show the fit with prediction interval, or with Fit + CI + PI to show the fit with confidence and prediction intervals.
Click OK.

The scatter plot shows the predictor (X axis) plotted against the response (Y axis). The polynomial fit (thick blue line) is shown surrounded by confidence interval bands (blue lines) showing the probable range of the polynomial fit in the underlying population. If the fit is good, the confidence interval bands will be close to the polynomial fit line. A prediction interval band (black lines) shows where future observations will likely lie.

Examining the residual plot

The residual plot allows visual assessment of the distance of each observation from the fitted line. The plot can show raw or standardized residuals, optionally with a histogram.

To modify the scatter plot:

If the Polynomial regression dialog box is not visible click Edit on the Analyse-it tab/toolbar.
Click Residual plot and select Raw to plot the actual residual (difference from fitted line) or select Standardized to show the residuals standardized (divided by) the SE.
Tick with Histogram of Residuals to show a histogram (with normal overlay) of the distribution of the residuals.
Click OK.

The residual plot shows the distance of each observation from the fitted line. The points should be scattered randomly in a constant width band, if the prior assumption of constant variance is met, and if the residuals are shown standardized within roughly ±2-3 SDs of zero if the fit is good. Standardized residuals further out at +/- 4 SDs should be investigated as possible outliers.

The histogram of the residuals allows visual assessment of the assumption that the measurement errors in the response variable are normally distributed.

References to further reading

Applied Regression Analysis (3rd Edition).
Norman R. Draper, Harry Smith. ISBN 0-471-17082-8 1998;

Polynomial regression

Arranging the dataset

Using the test

The polynomial fit equation

Examining the scatter plot

Examining the residual plot

References to further reading

Cookie preferences