A model that describes the relationship between a categorical response variable and one or more explanatory variables using a logit or probit function.
A generalized linear model (GLM) is a generalization of the linear models by allowing the linear model to be related to the response variable via a link function and an error distribution other than a normal distribution. The unknown model parameters are estimated using maximum-likelihood estimation.
| Type | Description |
|---|---|
| Logit / Logistic | Fit a model to a binary response variable expressed by the logit link function (log
odds ratio) and binomial error distribution. Note: This model is very common as the parameter
estimates can be interpreted as the log-odds or back transformed into an odds ratio.
|
| Probit | Fit a model to a binary response variable expressed by the probit function and binomial error distribution. |
Fit a simple logistic regression model to describe the relationship between a single predictor variable and a binary response variable.
Fit an advanced linear model to describe the relationship between many predictor variables and a binary response variable.
Fit a simple probit regression model to describe the relationship between a single predictor variable and a binary response variable.
Parameter estimates (also called coefficients) are associated with a one-unit change of the predictor, all other predictors being held constant.
A coefficient describes the size of the contribution of that predictor; a large coefficient indicates that the variable strongly influences the probability of that outcome, while a near-zero coefficient indicates that variable has little influence on the probability of that outcome. A positive sign indicates that the explanatory variable increases the probability of the outcome, while a negative sign indicates that the variable decreases the probability of that outcome. A confidence interval for each parameter shows the uncertainty in the estimate.
When the model contains categorical variables, the interpretation of the coefficients is more complex. For each term involving a categorical variable, a number of dummy predictor variables are created to predict the effect of each different level. There are different ways to code the predictors for a categorical variable, the most common method in logii/probit regression is called reference cell coding or dummy coding. In reference cell coding, the first category acts as a baseline, and you can interpret the other coefficients as an increase or decrease over the baseline category.
Odds ratios are the increase or decrease in odds associated with a change of the predictor, all other predictors been held constant.
Although mathematically the log odds ratios are easier to work with, when it comes to interpretation, it is more natural to use the odds ratios. Unlike the log odds ratio, the odds ratio is always positive. A value of 1 indicates no change. Values between 0 and less than 1 indicate a decrease in the probability of the outcome event. Values greater than 1 indicate an increase in the probability of the outcome event.
When the predictor variable is a categorical variable, the odds ratio is the increase or decrease in odds over the baseline category.
When a predictor variable is a continuous variable, the odds ratio is the increase or decrease in odds for a change in the predictor variable. The default is for a 1 unit change in the predictor, although it may be more appropriate to use a larger unit, such as for a change of 10 units of the predictor variable.
When a term includes an interaction, the odds ratio will depend on the interacting variables' values, so you will need to use multiple odd ratios to describe the behavior at different levels of interest.
Estimate the odds ratios using a logistic model.
A likelihood ratio or Wald X² test formally tests the hypothesis of whether the model fits the data better than no model.
It is common to test whether the model fits the data better than the null model with no parameters.
A X2 test formally tests whether the reduction is statistically significant. The null hypothesis states that all the parameters for the covariates are zero against the alternative that at least one parameter is not equal to zero. When the p-value is small, you can reject the null hypothesis and conclude that at least one parameter is not zero.
A likelihood ratio or Wald X² test formally tests the hypothesis of whether a term contributes to the model.
In most modeling analyses the aim is a model that describes the relationship using as few terms as possible. It is therefore of interest to look at each term in the model to decide if the term is providing any useful information.
A X2 test for each term is a formal hypothesis test to determine if the term provides useful information to the model. The null hypothesis states that the term does not contribute to the model, against the alternative hypothesis that it does. When the p-value is small, you can reject the null hypothesis and conclude that the term does contribute to the model.
When a term is not deemed to contribute statistically to the model, you may consider removing it. However, you should be cautious of removing terms that are known to contribute by some underlying mechanism, regardless of the statistical significance of a hypothesis test, and recognize that removing a term can alter the effect of other terms.
Fit model analysis study requirements and dataset layout.
Use a column for each predictor variable (Height, Sex) and a column for the response variable (Weight); each row has the values of the variables for a case (Subject).
| Subject (optional) | Height | Sex | Weight |
|---|---|---|---|
| 1 | 175 | M | 65 |
| 2 | 180 | M | 70 |
| 3 | 160 | F | 90 |
| 4 | 190 | F | 55 |
| 5 | 180 | M | 100 |
| 6 | 150 | F | 55 |
| 7 | 140 | M | 75 |
| 8 | 160 | M | 80 |
| 9 | 165 | F | 80 |
| 10 | 180 | M | 95 |
| … | … | … | … |