## Theodore R. Holford

Print publication date: 2002

Print ISBN-13: 9780195124408

Published to British Academy Scholarship Online: September 2009

DOI: 10.1093/acprof:oso/9780195124408.001.0001

Show Summary Details
Page of

PRINTED FROM BRITISH ACADEMY SCHOLARSHIP ONLINE (www.britishacademy.universitypressscholarship.com). (c) Copyright British Academy, 2022. All Rights Reserved. An individual user may print out a PDF of a single chapter of a monograph in BASO for personal use. Subscriber: null; date: 24 May 2022

# (p.393) Appendix 7 Theory on Proportional Hazards Regression

Source:
Multivariate Methods in Epidemiology
Publisher:
Oxford University Press

In Chapter 9, our first step toward formulating a model that did not require a specific parametric form for the effect of time was the piecewise constant hazard or piecewise exponential model, shown in equation 9.2. By employing the proportional hazards form for the model, we can obtain the survival function for an individual with covariates X i:

where i and j are indices for the subject and the interval, respectively, and t ij is the observed length of follow-up for the ith individual during the jth interval. The probability density function for the failure times follows directly:
where λj is the hazard for the final interval of follow-up in which the individual actually failed.

Let δji be an indicator that takes the value 1 if the ith individual fails during the jth interval, but is 0 otherwise. The sum of this indicator over all intervals, δi+, is 1 for a complete observation and 0 for one that is censored. We can obtain the likelihood from

(A7.1)
(p.394) A further simplification can be realized if individuals with identical levels of the covariates are divided into groups. If represents the sum of the status indicators for individuals in the kth risk group, it is clear that this quantity represents the total number of failures for the group during interval j. Likewise, the sum of the observed follow-up times during interval j for subjects in group k is T kj = Σt ij, and the likelihood simplifies to
Notice that the kernel of the contribution to this likelihood by individuals in the kth group during the jth interval is identical to the Poisson, which allows us to make use of this distribution when finding maximum likelihood estimates of the model parameters. If the hazard is a log-linear function of the covariates, the log likelihood becomes

As we saw in our discussion of survival curve estimates in Chapter 5, a completely nonparametric model with respect to the time component could be realized by passing to the limit, letting the interval widths go to 0 and the number of intervals to infinity. A somewhat simpler representation of the method of analysis results from establishing a log likelihood that depends only on the regression parameters, β. For each interval containing a failure, we have a nuisance parameter involving the product of the interval width and the hazard, Δλj. Intervals that do not include deaths drop out entirely, and the only relevant information is provided by intervals that include failure times, so that the likelihood given in equation A7.1 can be expressed in terms of a product over the intervals. For the jth interval, the individuals that are available—that is, have nonzero observed times during the interval—constitute a risk set, R j. Thus, the log likelihood becomes

(A7.2)
We can find an expression for the maximum likelihood estimate for λjΔ by setting
(p.395) to zero and solving, which yields
Substituting this expression into equation A7.2 results in
as a kernel log likelihood function (1–3), which is essentially the expression given originally by Cox (4). This expression can also be derived by selecting intervals over which the hazard is constant at cutpoints identified by the observed follow-up times (5). However, a mathematically rigorous development of this expression involves some of the deepest issues in statistical inference. Kalbfleisch and Prentice (6) give conditions under which this can be considered to be a marginal likelihood, but in a more general setting this problem has lead to the development of the concept of partial likelihood (7, 8).

A further extension of this likelihood involves the inclusion of time-dependent covariates, Z(t). The contribution of each failure to the overall likelihood once again involves a consideration of the risk set at each failure time—those individuals under active follow-up at a given failure time. The contribution of the jth failure becomes

(A7.3)
and the time-dependent covariates are essentially functions that are evaluated at each failure time, t j. The overall likelihood is found by multiplying these elements over the entire set of observed failure times. (p.396)