Polynomial regression determines the polynomial equation to predict a response (Y, sometimes called dependent) variable based on a predictor (X, sometimes called independent) variable. The polynomial order determines the number of inflexions (turns) on the curvi-linear fitted line.
The requirements of the test are:
Data in existing Excel worksheets can be used and should be arranged in a List dataset layout. The dataset must contain at least two continuous scale variables.
When entering new data we recommend using New Dataset to create a new 2 variables dataset ready for data entry.
To start the test:
Excel 97, 2000, 2002 & 2003: Select any cell in the range containing the dataset to analyse, then click Analyse on the Analyse-it toolbar, click Regression then click Polynomial.
The report shows the number of observations analysed, and, if applicable, how many missing values were listwise excluded.
R2 and adjusted R2 statistics summarise the goodness of the polynomial fit to the observations. Both statistics range from 0 to 1, with higher values indicating a better fit, and a value of 1 indicating a perfect fit. Adjusted R2 is similar to R2 except it accounts for the number of predictors in the model so Adjusted R2 statistics from models with a different number of predictors can be compared. R2 values cannot be compared between models.
Analysis of variance is used to test the hypothesis that the polynomial fit is a better fit than the mean. The total variance, the variance of the predictor fitted to just the mean, is partitioned into variance explained by the polynomial regression model and residual variance (the difference from the fitted line to the observations). An F- test then compares the partitioned variances to determine if they are significantly different. The F statistic shows the ratio of the variances, and the p- value the probability that the polynomial fit is no better than fitting to the mean. If the p- value is significant then polynomial fit is better than the mean.
The regression coefficients table shows the polynomial fit coefficients and confidence intervals for each predictor exponent and the intercept. The coefficients together combine to form the equation of the polynomial fit, the equation used to predict the response from the predictor, as follows:
y = a + bx + cx2 + dx3 ...
where a is the intercept coefficient (the point where the straight line intersects the Y axis), and b, c, d (and so on...) are the coefficients for the X predictor variable.
IMPORTANT When using the equation to predict values for Y ensure the coefficients are used to at least 4 significant figures. The values are shown to 4 significant figures, but if necessary, the cells contain the coefficients to much higher precision.
The p- value for each regression coefficient expresses the probability of rejecting the null hypothesis, that the term has no effect on the response, is it is in fact true. A significant p- value, or a coefficient confidence interval that doesn't span zero, implies the term has a significant contribution to the response. If higher order, higher power, terms are not significant, the polynomial order could be reduced without adversely affecting the goodness of fit.
METHOD The p- value is calculation using the t- distribution.
A scatter plot allows visual assessment of the relationship between the response and predictor variables. The plot can show the fit, the confidence interval for the fit, and prediction intervals.
To modify the scatter plot:
The scatter plot shows the predictor (X axis) plotted against the response (Y axis). The polynomial fit (thick blue line) is shown surrounded by confidence interval bands (blue lines) showing the probable range of the polynomial fit in the underlying population. If the fit is good, the confidence interval bands will be close to the polynomial fit line. A prediction interval band (black lines) shows where future observations will likely lie.
The residual plot allows visual assessment of the distance of each observation from the fitted line. The plot can show raw or standardized residuals, optionally with a histogram.
The residual plot shows the distance of each observation from the fitted line. The points should be scattered randomly in a constant width band, if the prior assumption of constant variance is met, and if the residuals are shown standardized within roughly ±2-3 SDs of zero if the fit is good. Standardized residuals further out at +/- 4 SDs should be investigated as possible outliers.
The histogram of the residuals allows visual assessment of the assumption that the measurement errors in the response variable are normally distributed.