1. Statistical Reference Guide
  2. Survival/Reliability

Survival/Reliability

Survival/reliability analyzes the time until an event of interest occurs.

In survival analysis, the variable of interest is the time it takes until an event occurs. Time is the length of time in hours, days, months, weeks, years or some other unit. The event of interest is often death, recovery from surgery, product failure or some other characteristic. A key feature of survival analysis is that the event of interest may not occur during the study's time frame, or contact may be lost with the participant part way through the study. In these cases, the observation is said to be censored, and we only have partial but still important information about the survival time. Most survival data is right-censored because the true survival time has been cut off at the right side of the observed time, giving us an observed survival time shorter than the true survival time.

Kaplan-Meier survival function

The Kaplan-Meier estimator (also known as the product-limit estimator) is a non-parametric statistic used to estimate the empirical survival function.

The Kaplan-Meier survival probability at failure time t(f) is the probability of surviving past the previous failure time t(f-1) multiplied by the conditional probability of surviving past time t(f) given survival to at least time t(f).

A plot of the survival probabilities at each ordered failure time produces an empirical survival function (or survival curve) - a step-function starting with a horizontal line at a survival probability of 1 and a step down at each failure time. A censored observation does not produce any step in the survival function, but it is sometimes useful to denote such observations.

A key statistic for describing the survival function are the quartiles (25th, 50th, and 75th quantiles of the survival function). Another statistic is the mean survival time, the area under the survival curve. It is often less useful as many survival functions do not drop to 0 (due to some observations being censored at the end of the study). A workaround, particularly when comparing multiple survival functions, is to restrict the area under the curve to the interval 0, t, which is common to all the survival curves.

Plotting a Kaplan-Meier survival curve

Plot the Kaplan-Meier survival function to visualize the survival probability over time.

  1. Select a cell in the dataset.
  2. On the Analyse-it ribbon tab, in the Statistical Analyses group, click Survival/Reliability, and then click Kaplan-Meier Survival Function.
    The analysis task pane opens.
  3. In the Y (time to event) drop-down list, select the time until the event of interest variable.
  4. Optional: In the Censor drop-down list, select the state variable that indicates whether the time was censored or if the event of interest occured, and then in the Event drop-down list, select the state that indicates the time was censored.
  5. Optional: In the X (factor) drop-down list, select the grouping variable.
  6. Optional: To show the survival probabilities for each failure time, On the Analyse-it ribbon tab, in the Survival/Reliability group, click S(t).
  7. Click Calculate.

Equality of survival functions test

An equality hypothesis test formally tests if two or more population survival functions are different.

A common goal of survival analysis is to compare the survival function of two or more groups. A formal hypothesis test tests whether two or more survival functions are statistically equal in some overall sense. That is, there is no evidence to suggest that the true (population) survival functions are different. The most popular test is the Log-rank test, although various other tests differ in how much weight they give to the survival probabilities at the start or end of the distribution.

The null hypothesis states that there is no difference between the survival functions, against the alternative hypothesis that at least one of the survival functions is different. When the test p-value is small, you can reject the null hypothesis and conclude the population survival functions differ.

Tests for the equality of survival functions

Tests for the equality of survival functions and their properties and assumptions.

Test Purpose
Log-rank Test if the survival functions are equal (uses equal weights).

The most popular test. Equivalent to the Mantel-Haenszel test of the hypothesis that the stratum specific odds-ratio is equal to one.

Wilcoxon

Test if the survival functions are equal with more weight on differences between the survival functions at smaller values of time (higher weights on early death). This test is appropriate when hazard functions vary in ways other than proportionally and when censoring patterns are similar across groups.

Tarone-Ware

Test if the survival functions are equal using weights equal to the sqrt(number at risk) rather than the number at risk used in Wilcoxon. Like the Wilcoxon test, it gives larger weights (although not as large) to earlier failure times. Although less susceptible to the censoring pattern in the data than Wilcoxon’s test, this could remain a problem if large differences in these patterns exist between groups.

Fleming-Harrington

Test if the survival functions are equal using weights kmp multiplied by (1-km)q. When p > q, the test weights earlier failures more than later ones. When p < q, the opposite is true, and more weight is given to later failures than to earlier ones. The Fleming–Harrington test reduces to the log-rank test when p = 1 and q = 0.

Comparing two or more survival functions

Plot multiple survival curves to make comparisons between them.

  1. Select a cell in the dataset.
  2. On the Analyse-it ribbon tab, in the Statistical Analyses group, click Survival/Reliability, and then click Compare Survival Functions.
    The analysis task pane opens.
  3. In the Y (time to event) drop-down list, select the time until the event of interest variable.
  4. Optional: In the Censor drop-down list, select the state variable that indicates whether the time was censored or if the event of interest occured, and then in the Event drop-down list, select the state that indicates the time was censored.
  5. In the X (factor) drop-down list, select the grouping variable.
  6. In the Survival function task panel, under the Equality of survival functions hypothesis test check box, in the Test drop-down list, select the statistical test to perform.
  7. Click Calculate.

Proportional hazards fit

A semi-parametric model that describes the relationship between a time to event response variable and one or more explanatory variables using a hazard function.

The Cox proportional hazard model is expressed as a hazard function h0(t) + exp(∑βiXi). The formula expresses the hazard at time t for an individual with a given set of explanatory variables as the product of two quantities. The first quantity is h0(t), the baseline hazard function. The second quantity is the exponent of the linear sum βiXi where the sum is over the p explanatory variables. An important feature of the proportional hazards model is that the baseline hazard function is a function of t but does not include the Xs. Therefore, it is unnecessary to specify the form of the baseline function in estimating the parameters. Also, the parameters in the second quantity are dependent only on the Xs and are independent of time. The unknown model parameters are estimated using maximum-likelihood estimation.

Fitting a proportional hazard model

Fit a proportional hazards model to describe the relationship between a time to event response variable and one or more predictor variables.

  1. Select a cell in the dataset.
  2. On the Analyse-it ribbon tab, in the Statistical Analyses group, click Fit Model, and then click Proportional Hazard.
    The analysis task pane opens.
  3. In the Model drop-down menu, select Advanced.
  4. In the Y drop-down list, select the response variable.
  5. Optional. In the Censor drop-down list, select the censor variable, and then in the Event drop-down list, select the censor event. A censored event is an individual was lost to follow up or the event of interest had not occured at the end of the study.
  6. In the Available variables list, select the predictor variable(s):
    • To select a single variable, click the variable.
    • To select multiple variables, click the first variable then hold down the CTRL key and click each additional variable.
    • To select a range of variables, click the first variable then hold down the SHIFT key and click the last variable in the range.
  7. Optional: Click the drop-down menu arrow next to the Available variables list, and then select the measurement scale of the variable.

    If the measurement scale is not set, Analyse-it tries to determine if the variable is categorical or continuous scale. After you add a variable to a model, the icon alongside the variable indicates the assumed scale.

  8. Click Add to add a variable, or click Factorial to add the factorial terms, or click Polynomial to add a polynomial terms.
  9. To add the interaction between two or more variables, in the Terms list box, click the first term then hold down the CTRL key and click each additional term to include in the interaction, and then click Cross.
  10. To remove a term, in the Terms list box, click the term and then click Remove.
  11. Repeat steps 6 through 10 to build the model.
  12. Click Calculate.

Hazard ratio estimates

Hazard ratios are the increase or decrease in the hazard associated with a change of the predictor, all other predictors been held constant.

Although mathematically the log hazard ratios are easier to work with when fitting models, when it comes to interpretation, it is more natural to use the hazard ratios. A hazard ratio is defined as the hazard for one individual (or group) divided by the hazard for another individual or group. A value of 1 indicates no change in the hazard. Values between 0 and less than 1 indicate a decrease in the hazard . Values greater than 1 indicate an increase in the hazard.

When the predictor variable is a categorical variable, the hazard ratio is the increase or decrease in hazard over the baseline category.

When a predictor variable is a continuous variable, the hazard ratio is the increase or decrease in hazard for a change in the predictor variable. The default is for a 1 unit change in the predictor, although it may be more appropriate to use a larger unit, such as for a change of 10 units of the predictor variable.

When a term includes an interaction, the hazard ratio will depend on the interacting variables' values, so you will need to use multiple hazard ratios to describe the behavior at different levels of interest.

Estimating hazard ratios

Estimate the hazard ratios using a proportional hazards model.

Ensure you have followed the steps in Fitting a proportional hazard model.
  1. Activate the analysis report worksheet.
  2. On the Analyse-it ribbon tab, in the Fit Model group, click Hazard Ratios.
    The analysis task pane Hazard Ratios panel opens.
  3. In the Hazard Ratios grid, in the Comparison column.: For a continuous variable, type the change in units of the predictor for the hazard ratio, the default is for 1 unit change in the predictor, but for some predictors the odds ratio for a 10 or 100 unit change in the predictor maybe more interpretable. For a categorical variable, select the type of comparison:
    Option Description
    All pairs Compare all pairs of categories against each other. For example, with 3 categories the comparisons, are 1v2,1v3,2v1,2v3,3v1,3v2.
    All distinct pairs Compare all unique pairs of categories against each other. For example, with 3 categories the comparisons as 2v1,3v1,3v2).
    Against reference Compare all categories against the reference category (the first category in the categorical variable). For example, with 3 categories the comparisons are 3v1, 2v1.
  4. If the model has interaction terms, in the At column for each continuous variable involved in an interaction, type the levels of interest as a comma (or your list delimiter) separated list. For example, if Age is a variable involved in an interaction with Gender, you may use 16, 25, 30, 40, and 60 as the levels of interest at which the Gender Female/Male hazard ratios will be computed. For a continuous variable involved in an interaction with a categorical variable, the unit change for each level of the categorical variable will be computed. For example, if the Age unit is 10 years and Age is involved in an interaction with Gender, the hazard ratio for a change of 10 years will be computed for Females and Males.
  5. In the Confidence interval edit box, type the confidence level.
  6. Click Recalculate.

Baseline survival function

A baseline survivual function is an estimate of h0(t) in a proportional hazards model.

A major advantage of the proportional hazard model is that it is semi-parametric, and baseline survival function h0(t) does not need to be specified to estimate the parameters. However, sometimes the baseline survival function is useful to understand the change in survival over time and combined with the estimated coefficients and specific covariate values to estimate the survival experience of subgroups of subjects of particular interest. The baseline survival function is estimated using maximum-likelihood at β=0 for the coefficients. In many cases, the baseline survival function for β=0 isn't easily visualized, and some recommend that the mean be subtracted before continuous variables are analyzed. In these cases, the reference survival function at β =0 for categorical variables (equivalent to the reference group) and β =mean for continuous variables is usually used.

Effect of model hypothesis test

A likelihood ratio or Wald X² test formally tests the hypothesis of whether the model fits the data better than no model.

It is common to test whether the model fits the data better than the null model with no parameters.

A X2 test formally tests whether the reduction is statistically significant. The null hypothesis states that all the parameters for the covariates are zero against the alternative that at least one parameter is not equal to zero. When the p-value is small, you can reject the null hypothesis and conclude that at least one parameter is not zero.

Effect of term hypothesis test

A likelihood ratio or Wald X² test formally tests the hypothesis of whether a term contributes to the model.

In most modeling analyses the aim is a model that describes the relationship using as few terms as possible. It is therefore of interest to look at each term in the model to decide if the term is providing any useful information.

A X2 test for each term is a formal hypothesis test to determine if the term provides useful information to the model.  The null hypothesis states that the term does not contribute to the model, against the alternative hypothesis that it does. When the p-value is small, you can reject the null hypothesis and conclude that the term does contribute to the model.

When a term is not deemed to contribute statistically to the model, you may consider removing it. However, you should be cautious of removing terms that are known to contribute by some underlying mechanism, regardless of the statistical significance of a hypothesis test, and recognize that removing a term can alter the effect of other terms.

Statistical Reference Guide v6.15