# (p.393) Appendix 7 Theory on Proportional Hazards Regression

# (p.393) Appendix 7 Theory on Proportional Hazards Regression

In Chapter 9, our first step toward formulating a model that did not require a specific parametric form for the effect of time was the piecewise constant hazard or piecewise exponential model, shown in equation 9.2. By employing the proportional hazards form for the model, we can obtain the survival function for an individual with covariates **X** _{i}:

*i*and

*j*are indices for the subject and the interval, respectively, and

*t*

_{ij}is the observed length of follow-up for the

*i*th individual during the

*j*th interval. The probability density function for the failure times follows directly:

_{j}is the hazard for the final interval of follow-up in which the individual actually failed.

Let δ_{ji} be an indicator that takes the value 1 if the *i*th individual fails during the *j*th interval, but is 0 otherwise. The sum of this indicator over all intervals, δ_{i+}, is 1 for a complete observation and 0 for one that is censored. We can obtain the likelihood from

*k*th risk group, it is clear that this quantity represents the total number of failures for the group during interval

*j*. Likewise, the sum of the observed follow-up times during interval

*j*for subjects in group

*k*is

*T*

_{kj}= Σ

*t*

_{ij}, and the likelihood simplifies to

*k*th group during the

*j*th interval is identical to the Poisson, which allows us to make use of this distribution when finding maximum likelihood estimates of the model parameters. If the hazard is a log-linear function of the covariates, the log likelihood becomes

As we saw in our discussion of survival curve estimates in Chapter 5, a completely nonparametric model with respect to the time component could be realized by passing to the limit, letting the interval widths go to 0 and the number of intervals to infinity. A somewhat simpler representation of the method of analysis results from establishing a log likelihood that depends only on the regression parameters, **β**. For each interval containing a failure, we have a nuisance parameter involving the product of the interval width and the hazard, Δλ*j*. Intervals that do not include deaths drop out entirely, and the only relevant information is provided by intervals that include failure times, so that the likelihood given in equation A7.1 can be expressed in terms of a product over the intervals. For the *j*th interval, the individuals that are available—that is, have nonzero observed times during the interval—constitute a risk set, *R* _{j}. Thus, the log likelihood becomes

_{j}Δ by setting

A further extension of this likelihood involves the inclusion of time-dependent covariates, **Z(t)**. The contribution of each failure to the overall likelihood once again involves a consideration of the risk set at each failure time—those individuals under active follow-up at a given failure time. The contribution of the *j*th failure becomes

*t*

_{j}. The overall likelihood is found by multiplying these elements over the entire set of observed failure times. (p.396)