This addresses two main points, how to specify a model for the library using distributions defined by hazards and why such a specification, with its initial conditions, is sufficient to define the trajectory for a model.
The discrete case is much easier to understand than the continuous case because it can be explained without employing any results from calculus. Throughout this section, \(\bf{X}\) will be assumed to real-valued random variable. For example, \(\bf{X}\) could represent latent periods for measles.
It frequently happens that random samples of the real valued variables such as \(\bf{X}\) are actually analyzed on a discrete scale. For example Stocks’ data on latent periods of measles in Figure 1. Distribution of latent periods for measles in London circa 1931 is based on daily visits by patients.
The (cumulative) distribution of \(\bf{X}\) is defined as
assuming \(F_{X}(\infty) = 1\). The density can be expressed as the difference in adjacent values of the distribution
For Stocks’ data in Figure 1. Distribution of latent periods for measles in London circa 1931, the density at day \(k\) should be interpreted as the probability of the appearance of symptoms since the previous visit on day \(k-1\).
The hazard is defined as the conditional probability that the value of a random selection from \(\bf{X}\) equals \(k\) given it this value is already known to exceed \(k-1\). Using the usual rules for computing conditional probabilities, the hazard is given by the following ratio
In the case of Stocks’ data, the hazards shown in Figure 2. Estimated hazards of latent periods for measles in London circa 1931 would correspond to the probability of symptoms appearing at day \(k\) given that the patient had not displayed symptoms at any previous visit. As time goes on, patients who have already developed symptoms effectively reduce the pool of patients in the study who are still in a state where they might first present symptoms on day \(k\). This is the origin of the term in the denominator.
On any given day, the hazard for latent periods can be interpreted as the rate of appearance of symptoms per asymptomatic (infected but not yet symptomatic) patient per day. For example, the hazard inferred from the Weibull distribution is approximately \(0.15\) on day 10. In other words, 15% of the patients that are asymptomatic on day 9 will present symptoms when examined on day 10.
This interpretation is extremely important because it connects a hazard with a rate for a specific process, and that rate has well defined units of measurement. In addition, it clarifies how rate parameters should be estimated from observational data. Failure to account for the shrinking pool over time is commonplace. In this case it would lead to a systematic errors in the estimation of process rates, especially at long times when the depletion effect is most pronounced.
The random variable \(\bf{X}\) is again assumed to be a real-valued, but the measurements will not be binned as above. The cumulative distribution not an integer \(k\) but a continuous time interval, \(\tau\).
assuming \(F_X(\infty)=1\). The density is the derivative of the cumulative distribution. The concept of the hazard is part of survival analysis, where survival is \(G_X(\tau)=1-F_X(\tau)\), and represents the probability the random variable, a time interval, is longer than \(\tau\). One expression for the hazard is that the density of the random variable is equal to the probability it survives to a time \(\tau\) multiplied by the hazard rate for firing at time \(\tau\), or, in probabilities,
Writing this same equation with its almost-sure equivalents defines the continuous hazard, \(\lambda_X(\tau)\),
This is a rearrangement away from the definition of the discrete case.
This library accepts a specification of a model in terms of hazards, an initial condition, and produces trajectories. This set of high-level steps to simulation (specify, initialize, step) has a well-defined abstraction called a finite state machine. It isn’t the finite state machine familiar to programmers but a mathematical model, coming from category theory, for a particularly simple class of computing systems. At a conceptual level, a finite state machine can be considered a black box that receives a sequence of input signal and produces an output signal for each input signal. Internally, the black box maintains a state – some sort of finite summary representation of the sequence of input signals encountered so far. For each input signal, the box performs two operations. In both cases, the decision depends on the current internal state and the identity of the input signal just received.
It is helpful to view the finite state machine layer as a mechanism to simulate a Markov chain or Markov process.
Roughly speaking, a Markov chain, \(\bf{X}\), is a probabilistic system that makes random jumps among a finite set of distinct states, \(s_0, s_1, s_2, \ldots, s_N\) such that the probability of choosing the next state, \(X_{n+1}\) depends only on the current state, \(X_n\). In mathematical terms, the conditional probabilities for state transitions must satisfy
Since more distant history does not affect future behavior, Markov chains are sometimes characterized as memoryless.
This relation can be iterated to compute the conditional probabilities for multiple time steps
Note, the transition probabilities \(\mathcal{P}[X_{n+1} = s_{l} | X_{n}=s_k]\) may depend on time (through the index \(n\)). These so-called time-inhomogeneous Markov chains arise when the system of interest is driven by external entities. Chains with time-independent conditional transition probabilities are called time-homogeneous. The dynamics of a time-homogeneous Markov chain is completely determined by the initial state and the transition probabilities. All processes considered in this document are time-homogeneous.
A Markov process is a generalization of the Markov chain such that time is viewed as continuous rather than discrete. As a result, it makes sense to record the times at which the transitions occur as part of the process itself.
The first step in this generalization is to define a stochastic process \(\bf{Y}\) that includes the transition times as well as the state, \(Y_{n} = (s_{j},t_{n})\).
The second step is to treat time on a truly continuous basis by defining a new stochastic process, \(\bf{Z}\), from \(\bf{Y}\) by the rule \(Z_{t} = s_k\) in the time interval \(t_n \le t < t_{n+1}\) given \(Y_{n} = (s_k, t_n)\) . In other words, \(\bf{Z}_{t}\) is a piecewise constant version of \(\bf{Y}\) as shown in Figure 4. Realization of a continuous time stochastic process and associated Markov chain.
A realization of the process \(\bf{Y}\) is defined by the closed diamonds (left end points) alone. Similarly, a realization of the process \(\bf{Z}_t\) is illustrated by the closed diamonds and line segments. The closed and open diamonds at the ends of the line segment indicate that the segments include the left but not the right end points.
The memoryless property for Markov processes is considerably more delicate than in the case of Markov chain because the time variable is continuous rather than discrete. In the case of \(\bf{Y}\), the conditional probabilities for state transitions of must satisfy
The proper generalization of the requirement of time-homeogeneity stated previously for Markov chains is that joint probability be unchanged by uniform shifts in time
for \(0<s<t\) and \(\tau > 0\). Stochastic processes with shift invariant state transition probabilities are called stationary.
When we examined hazard rates above, we were examining the rate of transitions for a Markov process. The overall probability of the next state of the Markov process is called the core matrix,
indicating a state change between the states \((s_i,s_j)\). The derivative of this is a rate,
which is a joint distribution over states and time intervals. Normalization for this quantity sums over possible states and future times,
The survival, in terms of the core matrix, is
This means our hazard is
For the measles example, the set of future states \(j\) of each individual include symptomatic and all the possible other ways an individual leaves the study, so you can think of \(j=\mbox{left town}\). In practice, we build a hazard in two steps. First, count the probability over all time for any one eventual state \(j\). This is the same stochastic probability \(\pi_{ij}\) that is seen in Markov chains. Second, measure the distribution of times at which intervals enter each new state \(j\), given that they are headed to that state. This is called the holding time, \(h_{ij}(\tau)\), and is a conditional probability. Together, these two give us the core matrix,
Note that \(h_{ij}(\tau)\) is a density whose integral \(H_{ij}(\tau)\) is a cumulative distribution. If we write the same equation in terms of probabilities, we see that it amounts to separating the Markov process into a marginal and conditional distribution.
Choosing the other option for the marginal gives us the waiting time formulation for the core matrix. It corresponds to asking first what is the distribution of times at which the next event happens, no matter which event, and then asking which events are more likely given the time of the event.
While the waiting time density \(w_i(\tau)\), is the derivative of the waiting time, we won’t end up needing to relation \(\pi_{ij}(\tau)\) to \(\Pi_{ij}(\tau)\) when finding trajectories or computing hazards, so the more complicated relationship won’t be a problem.
A generalized stochastic Petri net (GSPN) is a formal way to specify a a system of interacting, competing processes. Different organisms can compete, but, for this system, the likelihood of infecting a neighbor versus the likelihood of recovery are seen as competing, as well.
Define a system by placing tokens at places, the way you would put checkers on a game board. Each place represents a sub-state of the system, such as herd of animals. Five tokens on a place representing a herd means the herd has five animals.
Transitions compete to move the tokens. Each transition is an independent process. (We explain later how and why independent processes are able to represent biological processes that are clearly dependent.) Only transitions change the state. Each one triggers according to its own internal clock. This library can model non-exponential distributions of firing times.
There are many flavors of GSPN defined by various authors. The following model was chosen to support complex epidemiological simulations while still representing a semi-Markov system. In brief, for those familiar with GSPN, this system supports non-exponential distributions, without inhibitor arcs, with colored tokens, without immediate transitions, permitting simultaneous transitions.
The what of the GSPN is
The how of the GSPN is
A transition becomes enabled when there are as many tokens on the input places as the stochiometric coefficients require. Whether the output places need to be empty is a policy choice. The enabling time of a transition is part of the state of the system.
An enabled transition computes from the marking on its input and output places, and from any properties of the tokens in that marking, a continuous stochastic variable which determines the likelihood the transition will fire at any time in the future. This stochastic variable need not be an exponential distribution.
The system samples jointly from the stochastic variables of every enabled transition in order to determine which transition fires next. This is described in grave detail later.
When a transition fires, it removes from the marking of its input places the specified number of tokens and puts them in the output places. Any disparity in token count destroys or creates tokens. Which token is moved is specified by policy to be the first, last, or random token in the marking of the place.
Transitions which do not fire fall into four classes. Those transitions whose input and output places are not shared with inputs and outputs of the fired transitions retain their enabling time and may not make any changes to how their continuous variable distributions depend on the marking. Any dependence between transitions must be expressed by sharing intput or output places.
Those previously-enabled transitions which share input or output places, and for which the new marking still meets their enabling requirements, retain their original enabling times but may recalculate their stochastic variable distributions depending on the marking.
Previously-enabled transitions whose input or output places no longer have requisite tokens for enabling become disabled.
All transitions which may become enabled after the firing are enabled with an enabling time set to the current system time.
Each transition defines an independent continuous stochastic variable, whose density is \(f_\alpha(\tau, t_e)\) and cumulative density is \(F_\alpha(\tau, t_e)\), where \(t_e\) is the time at which the transition was enabled. If we define a new stochastic variable, whose value is the minimum of all of the \(f_\alpha\), then it defines the time of the next transition, and the index \(\alpha\) of the transition that came first determines, by its transition’s change to the marking, the new system state, \(j\). This means we have a race. The racing processes are dependent only when they modify shared marking.
If the \(f_\alpha\) are all exponential processes, then the minimum of the stochastic variates is analytically-tractable, and the result is exactly Gillespie’s method. Using a random number generator to sample each process separately would, again for exponential processes, result in the First Reaction method. The Next Reaction method is just another way to do this sampling in a statistically-correct manner, valid only for exponential processes.
For general distributions, the non-zero part of the core matrix, which we will call \(p_{nm}(\tau)\) for partial core matrix, right after the firing of any transition, is
Here \((n,m)\) label states determined by how each transition, \(\alpha\) changes the marking. The system has, inherently, a core matrix, but we compute its trajectory from this partial core matrix.