Summary
This report was produced for a course in stochastic processes. I explored how birth-death processes, a type of continuous time Markov chain, could be applied to epidemic modelling. I found that framing epidemic outbreaks as birth-death processes allows one to estimate the probability of a major epidemic occurring, and the subsequent duration and final sizes. The projections in the report are not meant to be accurate as the models made numerous unrealistic assumptions - this was just an academic exercise in modelling.
Introduction
COVID-19 has been declared a global pandemic by the WHO (Cucinotta and Vanelli 2020), and a lot of attention has been focused on the development of epidemiological models to understand its dynamics. The Susceptible-Infectious-Removed (SIR) epidemic model is a well-known model that captures the dynamics of an epidemic (McKendrick 1925). A SIR model comprises of three compartments,
where
The model described above is deterministic and does not have any inherent randomness. In other words, given certain parameters, the model will only output one fixed trajectory of an epidemic. Deterministic models are valid for sufficiently large populations, but are unable to capture typically stochastic phenomena like extinction (Rock et al. 2014). In situations where the number of infectious individuals is small or when the variability in transmission and recovery impacts the epidemic outcome, the randomness in stochastic models are more appropriate (Allen 2017).
The objective of this project is to explore the stochastic SIR epidemic model in the framework of a continuous-time Markov chain – more specifically, a birth-death process (BDP). This includes explaining the theoretical formulation of an SIR BDP, before fitting it onto COVID-19 case data and conducting simulation-based inference from the models.
Methods
SIR as a Continuous-Time Markov Chain & Birth-Death Process
To formulate the SIR model as a continuous-time Markov chain (CTMC), two discrete random variables
For a small period of time
The CTMC is a time-homogeneous process, in that the transition probabilities do not depend on specific values of
Event | Change |
Probability |
---|---|---|
Infection | ||
Recovery |
The model is initialised at time
Parameter estimation
Data
The dataset used in this paper was sourced from the Johns Hopkins University COVID-19 repository (CSSE 2020). The repository contains data on the daily number of confirmed cases, deaths and recovered cases for every country affected by the pandemic and is updated daily.
Maximum Likelihood Estimation
The likelihood for a continuously observed BDP is formulated as such:
where
However, this likelihood is difficult to evaluate analytically because the number of COVID-19 cases is only reported at discrete time points (daily) rather than over a continuous time. When a BDP is sampled discretely such that only
Therefore, a data augmentation technique was applied to approximate a continuously observed BDP. This involved splitting the daily reported case number into individual jumps of

An example of the result is illustrated above. On the 73rd day since the first case in Singapore, there were 75 new cases and 16 recoveries reported. The data augmentation process split the time interval (72,73) into 91 bins and computed 75 up steps and 16 down steps at random. Conducting this data augmentation technique on the daily case data enabled the maximisation of the likelihood function to estimate the transition rates
Country | SE | SE | ||||||
---|---|---|---|---|---|---|---|---|
France | 0.100 | 2.06 |
0.0371 | 1.25 |
2.71 | |||
Singapore | 0.138 | 7.78 |
0.0163 | 2.67 |
8.46 | |||
South Korea | 0.0409 | 7.77 |
0.0319 | 6.86 |
1.28 | |||
Sweden | 0.0729 | 1.37 |
0.0106 | 5.21 |
6.88 | |||
Thailand | 0.0766 | 7.67 |
0.0547 | 6.48 |
1.4 |
A limitation of the linear BDP is that it assumed that rates
As such, a more accurate interpretation of
Despite their simplification, the MLE estimates for
Simulation-based Results
The estimated model parameters were used to simulate various trajectories of the epidemic in the respective countries. The Gillespie algorithm allows for the simulation of possible solutions of a stochastic equation (Gillespie 1977) and can be applied in this scenario. The pseudocode of the algorithm is provided below.
|
The stochastic nature of each epidemic allows some inference to be made about the paths of the epidemic. These include the i. probability of a major outbreak, ii. the final size and iii. duration of the epidemic. For each country, a total of 100 simulations were run until
Probability of Major Outbreak
Due to the stochasticity of the birth-death process, there is a possibility of infections being removed before a major epidemic occurred, so not all processes will lead to major outbreaks. Here, a major epidemic is characterised as one that has a generally longer duration and substantially greater number of cases than a minor epidemic (Tritch and Allen 2018). Theoretically, only minor epidemics occur when


The probability of a major epidemic occurring can be approximated by the proportion of simulated Markov chains that resulted in a major epidemic. The results are summarised below.
Country | Theoretical |
Estimated |
---|---|---|
France | 0.631 | 0.84 |
Singapore | 0.881 | 0.85 |
South Korea | 0.219 | 0.24 |
Sweden | 0.855 | 0.86 |
Thailand | 0.286 | 0.5 |
The estimated probability of a major outbreak is largely correlated to the theoretical value
The results are consistent with the understanding that South Korea has been successful in managing the outbreak and effectively reduced the probability of a major outbreak occurring. On the other hand, the probability of major outbreaks occurring in the other countries are substantially higher.
Final Size
The final size of the outbreak refers to the total number of people expected to be infected by the disease. Bayes’ rule can be leveraged to estimate the final size based on the outcome of the simulated epidemics.

The typical distribution of final sizes is illustrated above. The distribution is characteristically bimodal with the final size depending on whether a minor or major epidemic occurred. The simulation results are summarised below and compared with the final size of the deterministic models.
Country | % | ODE | % | |||
---|---|---|---|---|---|---|
France | 1.73 |
3.25 | 1.44 |
21.9 | 1.71 |
26 |
Singapore | 3.68 |
1 | 3.12 |
53.4 | 3.67 |
62.8 |
South Korea | 1.39 |
2.54 | 3.35 |
0.652 | 1.36 |
2.65 |
Sweden | 5.79 |
1.36 | 4.98 |
49.4 | 5.79 |
57.4 |
Thailand | 3.17 |
3.22 | 1.58 |
2.23 | 3.17 |
4.53 |
From the table above, the expected final sizes given that a major outbreak occurred
It is also interesting to note the differences in the estimated final size
Expected Duration
The simulated results also allow one to estimate the duration of the outbreak, or in other words, the extinction time. Most theoretical research has focused on the duration of epidemics conditioned on non-extinction (Tritch & Allen, 2018). However, since the outbreaks were simulated till extinction, this project will only estimate the duration conditioned on extinction. Similar to the final size estimation, a possible simulation-based estimate of duration can be formulated by Bayes rule as such:

The plot above illustrates the distribution of extinction times. It follows a clear bimodal distribution given the respective outcomes (minor or major outbreaks). The results are summarised below. Since deterministic epidemic models only approach 0 but never become fully extinct, the extinction time for the ODE solution is taken to be when
Country | ODE | |||
---|---|---|---|---|
France | 909 | 23.4 | 767 | 894 |
Singapore | 1130 | 7.00 | 965 | 1070 |
South Korea | 3530 | 60.1 | 893 | 3710 |
Sweden | 1870 | 20.8 | 1614 | 1780 |
Thailand | 1640 | 33.5 | 835 | 1630 |
Similar to the expected final size, the expected duration given a major epidemic is close to the ODE solution. The overall expected duration can also be interpreted as a more realistic figure than the ODE solution since it accounts for the probability of only a minor outbreak occurring.
Discussion
The results summarised below illustrate that the selected countries have varied success in managing the COVID-19 pandemic. As mentioned earlier in this report, South Korea seemed to have the epidemic under control. Although the outbreak in South Korea is expected to last for almost as long as any other country, the low rates of
Country | Estimated |
||
---|---|---|---|
France | 0.84 | 21.9 | 767.44 |
Singapore | 0.85 | 53.4 | 965.03 |
South Korea | 0.24 | 0.652 | 892.59 |
Sweden | 0.86 | 49.4 | 1613.81 |
Thailand | 0.50 | 2.23 | 835.4 |
Although the results may seem worrying for Singapore and Sweden, which have an expected 53.4% and 49.4% of their population infected by COVD-19 respectively, it is important to note that these simulations were unable to account for direct public health intervention to reduce disease transmission. In Singapore, where strict physical distancing measures have been introduced, the final size will likely be much lower than expected. Nonetheless, these figures serve to highlight a possible ‘worst case’ scenario where COVID-19 is allowed to transmit freely in a highly mixed, homogeneous population.
Limitations
The BDP model used in this project was mainly limited by its assumptions. Firstly, the model assumed a closed population for each country, and did not consider the effects of births, deaths, or immigration on the epidemic dynamics. This was not a realistic assumption to make since the epidemics have been estimated to last for more than 3 years, during which other demographic effects might have affected the rates of infection and recovery, as well as the population size. From initial reports during the pandemic, it was also understood that immigration was a significant factor in transmission (Pinotti et al. 2020) and should be included in further analysis.
Secondly, for ease of estimating the likelihood, the removal process was simplified to include both recovery and death. This ‘R’ state in SIR is ambiguous and does not capture the severity of the outbreak beyond the number of infected cases. In other words, a country may have a high removal rate leading to a smaller epidemic, but this may or may not be a satisfactory outcome depending on the number of recovered cases relative to deaths. Future work may be undertaken to differentiate between recovered cases and deaths in order to have a clearer representation of the severity of the epidemic.
Furthermore, the Markov property, while useful in simplifying complex phenomena, does not always hold in reality as it does not consider historical trends beyond the past day, when in reality a lot of processes that underpin disease transmission are dependent on time periods much longer than one day. Thus, the BDP model is only useful for early approximations.
It should also be noted that a model is only as good as the data it is fed. The data used in the model only contains reported cases and the proportion of cases reported varies largely depending on different countries’ disease surveillance systems. This may not be representative of the true state of COVID-19 in various countries especially if they lack adequate testing. Nonetheless, the reported number of cases is the most direct way that the state of the epidemic can be observed, even if the statistics may be flawed.
Lastly, since the BDP approximation considers only +1 and -1 transitions between states, it is a time-consuming process to estimate parameters and conduct simulations for countries reported to have a large number of infections. For this reason, countries that were adversely affected by COVID-19 with over 100,000 confirmed cases, including the US, China and Italy, were excluded from this study even though it would have been of value to include them.
Conclusion
In conclusion, this project highlighted an application of birth-death processes to the COVID-19 pandemic. Firstly, it introduced the theoretical formulation of an SIR BDP and its transition probabilities. Secondly, a data augmentation technique was used to overcome the difficulty of computing maximum likelihood estimation on discretely observed BDP. The parameter estimates were then used to simulate outbreaks in different countries, and simulation-based inference was conducted. Lastly, the limitations of applying the BDPs to study the COVID-19 pandemic was discussed at length.
Overall, this project has shown that while the birth-death process is useful in modelling the number of COVID-19 infections, it is largely limited by its assumptions. The model could be refined by introducing more flexibility in order to represent other more complex aspects of the outbreak such as recovery, deaths, demographic trends, and public health interventions.