The Semi-Markov library is designed to streamline the process of creating efficient simulations of a large class of systems called semi-Markov processes [Howard:1971]. Semi-Markov processes naturally arise in many contexts including epidemiology [Viet:2004], physiology, ecology, atmospheric sciences, reliability engineering and risk management. This broad range of applications suggest the value of designing a generic library for simulating complex semi-Markov processes, independent of the particular application area.
The unifying idea on which this library is based is that there typically are many different pathways for a complex system to evolve between timesteps. Each pathway can be viewed as an elementary stochastic process with a user specified time-dependent transition rates and a rule for modifying the overall internal state of the system. At each instant of time, these elementary processes “compete”, figuratively speaking, for the chance to change the state of the whole system. Each time step in the simulation corresponds to an event – a “winner” is selected thus changing the internal state of the system and the sampling from the corresponding statistical distribution to determine the time increment. This competing process view provides a framework for users to develop simulations for complex models in an incremental manner.
It is easy to show that competing processes with exponentially distributed transition times have time-independent transition rates. This is the norm in some application areas such as chemical kinetics. In contrast, it is manifestly inappropriate for many biological applications such as physiology, ecology and epidemiology. For example, a classic paper by Stocks [Stocks:1931] clearly shows that the latent period for measles (the distribution times between infection and the appearance of symptoms) does not follow an exponential distribution. Stocks’ raw data from cases in London circa 1931, along with optimal fits to exponential, gamma, Weibull, and log-normal distributions computed using the SciPy statistical library, are shown in Figure 1. Distribution of latent periods for measles in London circa 1931. The fit of the data to the exponential distribution is very poor while the fits to the other distributions are very good.
This simple example shows that exclusive reliance on exponential distributions may lead systematic biases in stochastic simulations of epidemiological process. Therefore, this library provides support for general semi-Markov models based on competing processes with general probability distributions of transition times.
It is implemented using three cooperating layers:
This organization has many practical advantages:
Making a dynamical model, such as this library creates, is one step in achieving larger goals. The goal could be model selection to ask which conceptual process best describes observed data. It could be optimization of interventions to hinder or encourage an outcome. For example, see [Hartig2011] on using dynamical models for inference. These larger goals guide construction of a conceptual model for the problem, one that includes those behaviors that seem most relevant.
The steps from conception to having a running program which uses the Semi-Markov library are:
This library was created by the Analytical Framework for Infectious Disease Dynamics (AFIDD) group at Cornell University in conjunction with the USDA Agricultural Research Service. This work was supported by the Science & Technology Directorate, Department of Homeland Security via interagency agreement no. HSHQDC-10-X-00138.
This library is in the public domain.