Survival/reliability analyzes the time until an event of interest occurs.
In survival analysis, the variable of interest is the time it takes until an event occurs. Time is the length of time in hours, days, months, weeks, years or some other unit. The event of interest is often death, recovery from surgery, product failure or some other characteristic. A key feature of survival analysis is that the event of interest may not occur during the study's time frame, or contact may be lost with the participant part way through the study. In these cases, the observation is said to be censored, and we only have partial but still important information about the survival time. Most survival data is right-censored because the true survival time has been cut off at the right side of the observed time, giving us an observed survival time shorter than the true survival time.
The Kaplan-Meier estimator (also known as the product-limit estimator) is a non-parametric statistic used to estimate the empirical survival function.
The Kaplan-Meier survival probability at failure time t(f) is the probability of surviving past the previous failure time t(f-1) multiplied by the conditional probability of surviving past time t(f) given survival to at least time t(f).
A plot of the survival probabilities at each ordered failure time produces an empirical survival function (or survival curve) - a step-function starting with a horizontal line at a survival probability of 1 and a step down at each failure time. A censored observation does not produce any step in the survival function, but it is sometimes useful to denote such observations.
A key statistic for describing the survival function are the quartiles (25th, 50th, and 75th quantiles of the survival function). Another statistic is the mean survival time, the area under the survival curve. It is often less useful as many survival functions do not drop to 0 (due to some observations being censored at the end of the study). A workaround, particularly when comparing multiple survival functions, is to restrict the area under the curve to the interval 0, t, which is common to all the survival curves.
Plot the Kaplan-Meier survival function to visualize the survival probability over time.
An equality hypothesis test formally tests if two or more population survival functions are different.
A common goal of survival analysis is to compare the survival function of two or more groups. A formal hypothesis test tests whether two or more survival functions are statistically equal in some overall sense. That is, there is no evidence to suggest that the true (population) survival functions are different. The most popular test is the Log-rank test, although various other tests differ in how much weight they give to the survival probabilities at the start or end of the distribution.
The null hypothesis states that there is no difference between the survival functions, against the alternative hypothesis that at least one of the survival functions is different. When the test p-value is small, you can reject the null hypothesis and conclude the population survival functions differ.
Tests for the equality of survival functions and their properties and assumptions.
| Test | Purpose |
|---|---|
| Log-rank | Test if the survival functions are equal (uses equal weights). The most popular test. Equivalent to the Mantel-Haenszel test of the hypothesis that the stratum specific odds-ratio is equal to one. |
| Wilcoxon |
Test if the survival functions are equal with more weight on differences between the survival functions at smaller values of time (higher weights on early death). This test is appropriate when hazard functions vary in ways other than proportionally and when censoring patterns are similar across groups. |
| Tarone-Ware |
Test if the survival functions are equal using weights equal to the sqrt(number at risk) rather than the number at risk used in Wilcoxon. Like the Wilcoxon test, it gives larger weights (although not as large) to earlier failure times. Although less susceptible to the censoring pattern in the data than Wilcoxon’s test, this could remain a problem if large differences in these patterns exist between groups. |
| Fleming-Harrington |
Test if the survival functions are equal using weights kmp multiplied by (1-km)q. When p > q, the test weights earlier failures more than later ones. When p < q, the opposite is true, and more weight is given to later failures than to earlier ones. The Fleming–Harrington test reduces to the log-rank test when p = 1 and q = 0. |
Plot multiple survival curves to make comparisons between them.
A semi-parametric model that describes the relationship between a time to event response variable and one or more explanatory variables using a hazard function.
The Cox proportional hazard model is expressed as a hazard function h0(t) + exp(∑βiXi). The formula expresses the hazard at time t for an individual with a given set of explanatory variables as the product of two quantities. The first quantity is h0(t), the baseline hazard function. The second quantity is the exponent of the linear sum βiXi where the sum is over the p explanatory variables. An important feature of the proportional hazards model is that the baseline hazard function is a function of t but does not include the Xs. Therefore, it is unnecessary to specify the form of the baseline function in estimating the parameters. Also, the parameters in the second quantity are dependent only on the Xs and are independent of time. The unknown model parameters are estimated using maximum-likelihood estimation.
Fit a proportional hazards model to describe the relationship between a time to event response variable and one or more predictor variables.
Hazard ratios are the increase or decrease in the hazard associated with a change of the predictor, all other predictors been held constant.
Although mathematically the log hazard ratios are easier to work with when fitting models, when it comes to interpretation, it is more natural to use the hazard ratios. A hazard ratio is defined as the hazard for one individual (or group) divided by the hazard for another individual or group. A value of 1 indicates no change in the hazard. Values between 0 and less than 1 indicate a decrease in the hazard . Values greater than 1 indicate an increase in the hazard.
When the predictor variable is a categorical variable, the hazard ratio is the increase or decrease in hazard over the baseline category.
When a predictor variable is a continuous variable, the hazard ratio is the increase or decrease in hazard for a change in the predictor variable. The default is for a 1 unit change in the predictor, although it may be more appropriate to use a larger unit, such as for a change of 10 units of the predictor variable.
When a term includes an interaction, the hazard ratio will depend on the interacting variables' values, so you will need to use multiple hazard ratios to describe the behavior at different levels of interest.
Estimate the hazard ratios using a proportional hazards model.
A baseline survivual function is an estimate of h0(t) in a proportional hazards model.
A major advantage of the proportional hazard model is that it is semi-parametric, and baseline survival function h0(t) does not need to be specified to estimate the parameters. However, sometimes the baseline survival function is useful to understand the change in survival over time and combined with the estimated coefficients and specific covariate values to estimate the survival experience of subgroups of subjects of particular interest. The baseline survival function is estimated using maximum-likelihood at β=0 for the coefficients. In many cases, the baseline survival function for β=0 isn't easily visualized, and some recommend that the mean be subtracted before continuous variables are analyzed. In these cases, the reference survival function at β =0 for categorical variables (equivalent to the reference group) and β =mean for continuous variables is usually used.
A likelihood ratio or Wald X² test formally tests the hypothesis of whether the model fits the data better than no model.
It is common to test whether the model fits the data better than the null model with no parameters.
A X2 test formally tests whether the reduction is statistically significant. The null hypothesis states that all the parameters for the covariates are zero against the alternative that at least one parameter is not equal to zero. When the p-value is small, you can reject the null hypothesis and conclude that at least one parameter is not zero.
A likelihood ratio or Wald X² test formally tests the hypothesis of whether a term contributes to the model.
In most modeling analyses the aim is a model that describes the relationship using as few terms as possible. It is therefore of interest to look at each term in the model to decide if the term is providing any useful information.
A X2 test for each term is a formal hypothesis test to determine if the term provides useful information to the model. The null hypothesis states that the term does not contribute to the model, against the alternative hypothesis that it does. When the p-value is small, you can reject the null hypothesis and conclude that the term does contribute to the model.
When a term is not deemed to contribute statistically to the model, you may consider removing it. However, you should be cautious of removing terms that are known to contribute by some underlying mechanism, regardless of the statistical significance of a hypothesis test, and recognize that removing a term can alter the effect of other terms.