1. Statistical Reference Guide
  2. Fit model
  3. Logistic / Probit fit

Logistic / Probit fit

A model that describes the relationship between a categorical response variable and one or more explanatory variables using a logit or probit function.

Generalized linear models

A generalized linear model (GLM) is a generalization of the linear models by allowing the linear model to be related to the response variable via a link function and an error distribution other than a normal distribution. The unknown model parameters are estimated using maximum-likelihood estimation.

Type Description
Logit / Logistic Fit a model to a binary response variable expressed by the logit link function (log odds ratio) and binomial error distribution.
Note: This model is very common as the parameter estimates can be interpreted as the log-odds or back transformed into an odds ratio.
Probit Fit a model to a binary response variable expressed by the probit function and binomial error distribution.

Fitting a simple logistic regression

Fit a simple logistic regression model to describe the relationship between a single predictor variable and a binary response variable.

  1. Select a cell in the dataset.
  2. On the Analyse-it ribbon tab, in the Statistical Analyses group, click Fit Model, and then click Logit / Logistic.
    The analysis task pane opens.
  3. In the Y drop-down list, select the binary response variable.
  4. In the Event drop-down list, select the outcome of interest.
  5. In the X drop-down list, select the predictor variable.
  6. Click Calculate.

Fitting an advanced logistic model

Fit an advanced linear model to describe the relationship between many predictor variables and a binary response variable.

  1. Select a cell in the dataset.
  2. On the Analyse-it ribbon tab, in the Statistical Analyses group, click Fit Model, and then click Logit / Logistic.
    The analysis task pane opens.
  3. In the Model drop-down menu, select Advanced.
  4. In the Y drop-down list, select the response variable.
  5. In the Event drop-down list, select the outcome of interest.
  6. In the Available variables list, select the predictor variable(s):
    • To select a single variable, click the variable.
    • To select multiple variables, click the first variable then hold down the CTRL key and click each additional variable.
    • To select a range of variables, click the first variable then hold down the SHIFT key and click the last variable in the range.
  7. Optional: Click the drop-down menu arrow next to the Available variables list, and then select the measurement scale of the variable.

    If the measurement scale is not set, Analyse-it tries to determine if the variable is categorical or continuous scale. After you add a variable to a model, the icon alongside the variable indicates the assumed scale.

  8. Click Add to add a variable, or click Factorial to add the factorial terms, or click Polynomial to add a polynomial terms.
  9. To add the interaction between two or more variables, in the Terms list box, click the first term then hold down the CTRL key and click each additional term to include in the interaction, and then click Cross.
  10. To remove a term, in the Terms list box, click the term and then click Remove.
  11. Repeat steps 6 through 10 to build the model.
  12. Click Calculate.

Fitting a simple probit regression

Fit a simple probit regression model to describe the relationship between a single predictor variable and a binary response variable.

  1. Select a cell in the dataset.
  2. On the Analyse-it ribbon tab, in the Statistical Analyses group, click Fit Model, and then click Probit.
    The analysis task pane opens.
  3. In the Y drop-down list, select the binary response variable.
  4. In the Event drop-down list, select the outcome of interest.
  5. In the X drop-down list, select the predictor variable.
  6. Click Calculate.

Parameter estimates

Parameter estimates (also called coefficients) are associated with a one-unit change of the predictor, all other predictors being held constant.

A coefficient describes the size of the contribution of that predictor; a large coefficient indicates that the variable strongly influences the probability of that outcome, while a near-zero coefficient indicates that variable has little influence on the probability of that outcome. A positive sign indicates that the explanatory variable increases the probability of the outcome, while a negative sign indicates that the variable decreases the probability of that outcome. A confidence interval for each parameter shows the uncertainty in the estimate.

When the model contains categorical variables, the interpretation of the coefficients is more complex. For each term involving a categorical variable, a number of dummy predictor variables are created to predict the effect of each different level. There are different ways to code the predictors for a categorical variable, the most common method in logii/probit regression is called reference cell coding or dummy coding. In reference cell coding, the first category acts as a baseline, and you can interpret the other coefficients as an increase or decrease over the baseline category.

Odds ratio estimates

Odds ratios are the increase or decrease in odds associated with a change of the predictor, all other predictors been held constant.

Although mathematically the log odds ratios are easier to work with, when it comes to interpretation, it is more natural to use the odds ratios. Unlike the log odds ratio, the odds ratio is always positive. A value of 1 indicates no change. Values between 0 and less than 1 indicate a decrease in the probability of the outcome event. Values greater than 1 indicate an increase in the probability of the outcome event.

When the predictor variable is a categorical variable, the odds ratio is the increase or decrease in odds over the baseline category.

When a predictor variable is a continuous variable, the odds ratio is the increase or decrease in odds for a change in the predictor variable. The default is for a 1 unit change in the predictor, although it may be more appropriate to use a larger unit, such as for a change of 10 units of the predictor variable.

When a term includes an interaction, the odds ratio will depend on the interacting variables' values, so you will need to use multiple odd ratios to describe the behavior at different levels of interest.

Estimating odds ratios

Estimate the odds ratios using a logistic model.

  1. Activate the analysis report worksheet.
  2. On the Analyse-it ribbon tab, in the Fit Model group, click Odds Ratios.
    The analysis task pane Odds Ratios panel opens.
  3. In the Odds Ratios grid, in the Comparison column.: For a continuous variable, type the change in units of the predictor for the odd ratio, the default is for 1 unit change in the predictor, but for some predictors the odds ratio for a 10 or 100 unit change in the predictor maybe more interpretable. For a categorical variable, select the type of comparison:
    Option Description
    All pairs Compare all pairs of categories against each other. For example, with 3 categories the comparisons, are 1v2,1v3,2v1,2v3,3v1,3v2.
    All distinct pairs Compare all unique pairs of categories against each other. For example, with 3 categories the comparisons as 2v1,3v1,3v2).
    Against reference Compare all categories against the reference category (the first category in the categorical variable). For example, with 3 categories the comparisons are 3v1, 2v1.
  4. If the model has interaction terms, in the At column for each continuous variable involved in an interaction, type the levels of interest as a comma (or your list delimiter) separated list. For example, if Age is a variable involved in an interaction with Gender, you may use 16, 25, 30, 40, and 60 as the levels of interest at which the Gender Female/Male odds ratios will be computed. For a continuous variable involved in an interaction with a categorical variable, the unit change for each level of the categorical variable will be computed. For example, if the Age unit is 10 years and Age is involved in an interaction with Gender, the odds ratio for a change of 10 years will be computed for Females and Males.
  5. In the Confidence interval edit box, type the confidence level.
  6. Click Recalculate.

Effect of model hypothesis test

A likelihood ratio or Wald X² test formally tests the hypothesis of whether the model fits the data better than no model.

It is common to test whether the model fits the data better than the null model with no parameters.

A X2 test formally tests whether the reduction is statistically significant. The null hypothesis states that all the parameters for the covariates are zero against the alternative that at least one parameter is not equal to zero. When the p-value is small, you can reject the null hypothesis and conclude that at least one parameter is not zero.

Effect of term hypothesis test

A likelihood ratio or Wald X² test formally tests the hypothesis of whether a term contributes to the model.

In most modeling analyses the aim is a model that describes the relationship using as few terms as possible. It is therefore of interest to look at each term in the model to decide if the term is providing any useful information.

A X2 test for each term is a formal hypothesis test to determine if the term provides useful information to the model.  The null hypothesis states that the term does not contribute to the model, against the alternative hypothesis that it does. When the p-value is small, you can reject the null hypothesis and conclude that the term does contribute to the model.

When a term is not deemed to contribute statistically to the model, you may consider removing it. However, you should be cautious of removing terms that are known to contribute by some underlying mechanism, regardless of the statistical significance of a hypothesis test, and recognize that removing a term can alter the effect of other terms.

Study design

Fit model analysis study requirements and dataset layout.

Requirements

  • 1 or more categorical or quantitative predictor variables.
  • A categorical or quantitative response variable.

Dataset layout

Use a column for each predictor variable (Height, Sex) and a column for the response variable (Weight); each row has the values of the variables for a case (Subject).

Subject (optional) Height Sex Weight
1 175 M 65
2 180 M 70
3 160 F 90
4 190 F 55
5 180 M 100
6 150 F 55
7 140 M 75
8 160 M 80
9 165 F 80
10 180 M 95

Statistical Reference Guide v6.15