Survival Times and Probabilities

My acquaintance gwern had a question: given the current total lifetime of a certain web service, how to infer the probable future lifetime?

Quick Estimate

A quick and dirty intuitive method of calculating this uses Laplace’s rule of succession: treat survival or failure as a Bernoulli trial per unit time, so that given that the subject has survived for N days without a closure on any of those days, we could say that the probability of surviving the next day is as given:

p (N) = \frac{1}{N + 2}

$p(N) = \frac{1}{N+2}$

In that case the probability that we survive at least the next $n$ days could be $p^n$ , or

P (n | N) = {(\frac{1}{N + 2})}^{n}

$P(n | N) = \left(\frac{1}{N+2}\right)^n$

But of course that’s not really right, because after the first of those $n$ days we have one more day of information, so we’re a little bit more likely to survive the next. Therefore it can’t just be $p^n$ . We really want:

P (n | N) = \prod_{d = 0}^{n - 1} (\frac{1}{N + d + 2})

$P(n | N) = \prod_{d=0}^{n-1} \left(\frac{1}{N+d+2}\right)$

Problem?

The above solutions have two big problems. The first is somewhat obvious: the second, more correct solution requires $n$ to be an integral multiple of your unit of time. You can’t ask about $n=1.618$ days, because you can’t take the product of discrete steps from $d=0$ to $1.618$ .

The second problem is more subtle: neither of these solutions are invariant with respect to the base unit of time. This is easiest to see with the first solution – changing from measuring in days to months turns $\left(\frac{1}{N+2}\right)^n$ into $\left(\frac{1}{N/30+2}\right)^{(n/30)}$ which is not identically equal. This is a problem because our units of time are an arbitrary convention and should make no difference to the result.

A True Solution

The resolution to both those problems lies in observing that the rule of succession isn’t really aware of time as a continuous (as opposed to discrete) notion. We need a proper model. As it turns out, the usual model for ‘lifetimes’ of things is a negative exponential:

P (T = t | a) = a e^{- a t}

$P(T=t | a) = a e^{-at}$

P (T > t | a) = e^{- a t}

$P(T>t | a) = e^{-at}$

This could perhaps be called the continuous generalization of a Bernoulli trial. The event has a constant probability per unit time given that it hasn’t happened yet, described by $a$ , which is called the “hazard rate”. $a$ has units of inverse time, or “events per unit time”, as you’d expect.

In our case, $a$ is an unknown variable. The only input we’ve been given is the survival time so far, $Y$ , from which we must calculate the distribution for the future survival time, $X$ (switching to variables $X$ and $Y$ here rather than $n,N$ since we’re no longer thinking discretely). Our goal is to calculate the posterior distribution:

P (X = x | Y = y)

$P(X=x | Y=y)$

P (T = (x + y) | T > y)

$P(T=(x+y) | T>y)$

We can start by calculating the posterior distribution on $a$ :

\begin{aligned} P (a | T > y) & = \frac{P (T > y | a) P (a)}{P (T > y)} \\ = \frac{P (T > y | a) P (a)}{\int d a^{'} P (T > y | a^{'}) P (a^{'})} \end{aligned}

$\begin{aligned} P(a | T>y) &= \frac{P(T>y|a) P(a)}{P(T>y)} \\ &= \frac{P(T>y|a) P(a)}{\int da' ~ P(T>y|a') P(a')} \end{aligned}$

A reasonable choice of prior $P(a)$ is also a negative exponential $P(a) = w e^{-wa}$ . We can take the limit $w \to 0$ for the flat (improper) prior of maximum ignorance. Then,

\begin{aligned} P (T > y | a) P (a) & = e^{- a y} w e^{- w a} \\ = w e^{- a (y + w)} \\ P (T > y) & = w \int_{0}^{\infty} e^{- a (y + w)} d a \\ = \frac{w}{y + w} \end{aligned}

$\begin{aligned} P(T>y | a) P(a) &= e^{-ay} w e^{-wa} \\ &= w e^{-a(y+w)} \\ P(T>y) &= w \int_0^\infty e^{-a(y+w)} ~ da \\ &= \frac{w}{y+w} \end{aligned}$

So we have our posterior distribution on $a$ :

\begin{aligned} P (a | T > y) & = (y + w) e^{- a (y + w)} \end{aligned}

$\begin{aligned} P(a | T>y) &= (y + w) e^{-a(y+w)} \end{aligned}$

With this we can calculate the posterior pdf on $X$ , by marginalizing over $a$ :

\begin{aligned} P (T = (x + y) | T > y) & = \int_{0}^{\infty} a e^{- a x} P (a | T > y) d a \\ = \int_{0}^{\infty} a e^{- a x} (y + w) e^{- a (y + w)} d a \\ = (y + w) \int_{0}^{\infty} a e^{- a (w + y + x)} d a \\ = \frac{w + y}{(w + y + x)^{2}} \end{aligned}

$\begin{aligned} P(T=(x+y) | T>y) &= \int_0^∞ a e^{-ax} P(a|T>y) ~ da \\ &= \int_0^∞ a e^{-ax} (y + w) e^{-a(y+w)} ~ da \\ &= (y + w) \int_0^∞ a e^{-a(w + y + x)} ~ da \\ &= \frac{w + y}{(w + y + x)^2} \end{aligned}$

And the posterior reverse cumulative distribution:

\begin{aligned} P (T > (x + y) | T > y) & = \int_{0}^{\infty} e^{- a x} (y + w) e^{- a (y + w)} d a \\ = \int_{0}^{\infty} (y + w) e^{- a (w + y + x)} d a \\ = \frac{w + y}{w + y + x} \end{aligned}

$\begin{aligned} P(T>(x+y) | T>y) &= \int_0^∞ { e^{-ax} ~ (y + w) e^{-a(y+w)} } ~ da \\ &= \int_0^∞ { (y + w) e^{-a(w+y+x)} } ~ da \\ &= \frac{w + y}{w + y + x} \end{aligned}$

Notice that $w$ (the prior) seems to affect the result in the same way as would $w$ units of previously observed survival time. Taking the ignorance prior $w=0$ , the results are:

\begin{aligned} P (a | Y = y) & = y e^{- a y} \\ P (X = x | Y = y) & = \frac{y}{(y + x)^{2}} \\ P (X > x | Y = y) & = \frac{y}{y + x} \end{aligned}

$\begin{aligned} P(a | Y=y) &= y e^{-ay} \\ P(X=x | Y=y) &= \frac{y}{(y + x)^2} \\ P(X>x | Y=y) &= \frac{y}{y + x} \end{aligned}$

The expected value of $a$ is $1/y$ which is quite reasonable, but notice that since $x P(X=x | Y=y)$ goes as $1/x$ asymptotically, the expectation $E[x|y]$ doesn’t converge.

Also notice that these results are properly invariant with respect to the unit of time. $P(X>x | Y=y)$ is a unitless probability, and therefore is completely invariant as it should be, while $P(X=x | Y=y)$ and $P(a | Y=y)$ are probability densities with respect to time and inverse time, and therefore have units of inverse time / time as they should.