4 Model description
Here we describe the methods of MOSAIC beta version 0.1. This model version provides a starting point for understanding cholera transmission in Sub-Saharan Africa, incorporating important drivers of disease dynamics such as human mobility, environmental conditions, and vaccination schedules. As MOSAIC continues to evolve, future iterations will refine model components based on available data and improved model mechanisms, which we hope will increase its applicability to real-world scenarios.
The model operates on weekly time steps from January 2023 to August 2024 and includes the 46 countries in Sub-Saharan Africa (SSA) shown in Figure 4.1.
4.1 Transmission dynamics
The model has a metapopulation structure with familiar compartments for Susceptible, Infected, and Recovered individuals with SIRS dynamics. The model also contains compartments for vaccinated individuals (V) and Water & environment based transmission (W) which we refer to as SVIWRS.
The SVIWRS metapopulation model, shown in Figure 4.2, is governed by the following difference equations:
\[\begin{equation} \begin{aligned} S_{j,t+1} &= b_j N_{jt} + S_{jt} - \phi \nu_{jt} S_{jt} + \omega V_{jt} - \Lambda_{j,t+1} - \Psi_{j,t+1} + \varepsilon R_{jt} - d_j S_{jt}\\[11pt] V_{j,t+1} &= V_{jt} + \phi \nu_{jt} S_{jt} - \omega V_{jt} - d_j V_{jt}\\[11pt] I_{j,t+1} &= I_{jt} + \Lambda_{j,t+1} + \Psi_{j,t+1} - \gamma I_{jt} - \mu \sigma I_{jt} - d_j I_{jt}\\[11pt] W_{j,t+1} &= W_{jt} + \zeta I_{jt} - \delta_{jt} W_{jt}\\[11pt] R_{j,t+1} &= R_{jt} + \gamma I_{jt} - \varepsilon R_{jt} - d_j R_{jt}\\[11pt] \end{aligned} \tag{4.1} \end{equation}\]
For descriptions of all parameters in Equation (4.1), see Table (4.14). Transmission dynamics are driven by the two force of infection terms, \(\Lambda_{jt}\) and \(\Psi_{jt}\). The force of infection due to human-to-human (\(\Lambda_{jt}\)) is:
\[\begin{equation} \begin{aligned} \Lambda_{j,t+1} &= \frac{ \beta_{jt}^{\text{hum}} \Big(\big(S_{jt}(1-\tau_{j})\big) \big(I_{jt}(1-\tau_{j}) + \sum_{\forall i \not= j} (\pi_{ij}\tau_iI_{it}) \big)\Big)^\alpha}{N_{jt}}.\\[11pt] \end{aligned} \tag{4.2} \end{equation}\]
Where \(\beta_{jt}^{\text{hum}}\) is the rate of human-to-human transmission. Movement within and among metapopulations is governed by \(\tau_i\), the probability of departing origin location \(i\), and \(\pi_{ij}\) is the relative probability of travel from origin \(i\) to destination \(j\) (see section on spatial dynamics). To include environmental effects, the force of infection due to environment-to-human transmission (\(\Psi_{jt}\)) is defined as:
\[\begin{equation} \begin{aligned} \Psi_{j,t+1} &= \frac{\beta_{jt}^{\text{env}} \big(S_{jt}(1-\tau_{j})\big) (1-\theta_j) W_{jt}}{\kappa+W_{jt}},\\[11pt] \end{aligned} \tag{4.3} \end{equation}\]
where \(\beta_{jt}^{\text{env}}\) is the rate of environment-to-human transmission and \(\theta_j\) is the proportion of the population at location \(j\) that at least basic access to Water, Sanitation, and Hygiene (WASH). The environmental compartment of the model is also scaled by the concentration (cells per mL) of V. cholerae that is required for a 50% probability of infection Fung 2014. See the section on environmental transmission below for more on the water/environment compartment and climatic drivers of transmission.
Note that all model processes are stochastic. Transition rates are converted to probabilities with the commonly used formula \(p(t) = 1-e^{-rt}\) (see Ross 2007), and then integer quantities are moved between model compartments at each time step according to a binomial process like the example below for the recovery of infected individuals (\(\gamma I_{jt}\)):
\[\begin{equation} \frac{\partial R}{\partial t} \sim \text{Binom}(I_{jt}, 1-\exp(-\gamma)) \tag{4.4} \end{equation}\]
4.2 Seasonality
Cholera transmission is seasonal and is typically associated with the rainy season, so both transmission rate terms \(\beta_{jt}^{\text{*}}\) are temporally forced. For human-to-human transmission we used a truncated sine-cosine form of the Fourier series with two harmonic features which has the flexibility to capture seasonal transmission dynamics driven by extended rainy seasons and/or biannual trends:
\[\begin{equation} \beta_{jt}^{\text{hum}} = \beta_{j0}^{\text{hum}} + a_1 \cos\left(\frac{2\pi t}{p}\right) + b_1 \sin\left(\frac{2\pi t}{p}\right) + a_2 \cos\left(\frac{4\pi t}{p}\right) + b_2 \sin\left(\frac{4\pi t}{p}\right) \tag{4.5} \end{equation}\]
Where, \(\beta_{j0}^{\text{hum}}\) is the mean human-to-human transmission rate at location \(j\) over all time steps. Seasonal dynamics are determined by the parameters \(a_1\), \(b_1\) and \(a_2\), \(b_2\) which gives the amplitude of the first and second waves respectively. The periodic cycle \(p\) is 52, so the function controls the temporal variation in \(\beta_{jt}^{\text{hum}}\) over the 52 weeks of the year.
We estimated the parameters in the Fourier series (\(a_1\), \(b_1\), \(a_2\), \(b_2\)) using the Levenberg–Marquardt algorithm in the minpack.lm
R library. Given the lack of reported cholera case data for many countries in SSA and the association between cholera transmission and the rainy season, we leveraged seasonal precipitation data to help fit the Fourier wave function to all countries. We first gathered weekly precipitation values from 1994 to 2024 for 30 uniformly distributed points within each country from the Open-Meteo Historical Weather Data API. Then we fit the Fourier series to the weekly precipitation data and used these parameters as the starting values when fitting the model to the more sparse cholera case data.
For countries with no reported case data, we inferred seasonal dynamics using the fitted wave function of a neighboring country with available case data. The selected neighbor was chosen from the same cluster of countries (grouped hierarchically into four clusters based on precipitation seasonality using Ward’s method; see Figure 4.5) that had the highest correlation in seasonal precipitation with the country lacking case data. In the rare event that no country with reported case data was found within the same seasonal cluster, we expanded the search to the 10 nearest neighbors and continued expanding by adding the next nearest neighbor until a match was found.
Using the model fitting methods described above, and the cluster-based approach for inferring the seasonal Fourier series pattern in countries without reported cholera cases, we modeled the seasonal dynamics for all 41 countries in the MOSAIC framework. These dynamics are visualized in Figure 4.6, with the corresponding Fourier model coefficients presented in Table 4.1.
Country | \(a_1\) | \(a_2\) | \(b_1\) | \(b_2\) |
---|---|---|---|---|
Burundi | -0.31 (-0.35 to -0.27) | -0.49 (-0.53 to -0.45) | -0.42 (-0.45 to -0.39) | -0.36 (-0.41 to -0.31) |
Cameroon | -0.64 (-0.67 to -0.62) | 0.08 (0.04 to 0.11) | 0.01 (-0.04 to 0.06) | -0.28 (-0.31 to -0.26) |
DRC | 0.28 (0.26 to 0.31) | -0.22 (-0.31 to -0.13) | 0.15 (0.08 to 0.23) | -0.2 (-0.28 to -0.13) |
Ethiopia | -0.52 (-0.59 to -0.46) | -0.38 (-0.41 to -0.34) | -0.04 (-0.12 to 0.03) | -0.03 (-0.13 to 0.07) |
Kenya | 0.03 (-0.02 to 0.07) | -0.16 (-0.19 to -0.12) | 0.5 (0.46 to 0.53) | 0.12 (0.09 to 0.15) |
Malawi | 0.51 (0.47 to 0.54) | 0.16 (0.12 to 0.2) | 0.39 (0.36 to 0.43) | 0.4 (0.37 to 0.44) |
Mozambique | 0.22 (0.19 to 0.26) | -0.36 (-0.39 to -0.33) | 0.48 (0.44 to 0.53) | 0.02 (-0.04 to 0.07) |
Nigeria | -0.27 (-0.3 to -0.24) | 0.22 (0.15 to 0.29) | -0.16 (-0.2 to -0.11) | 0.42 (0.38 to 0.46) |
Somalia | -0.16 (-0.2 to -0.12) | -0.24 (-0.3 to -0.17) | 0.84 (0.82 to 0.86) | -0.59 (-0.61 to -0.57) |
South Africa | -0.41 (-0.62 to -0.19) | 0.13 (-0.07 to 0.33) | -0.54 (-0.67 to -0.41) | 0.63 (0.54 to 0.72) |
Tanzania | 0.03 (0 to 0.06) | -0.06 (-0.12 to -0.01) | 0.09 (0.06 to 0.12) | 0.23 (0.2 to 0.26) |
Togo | -0.17 (-0.45 to 0.12) | -0.28 (-0.43 to -0.13) | -0.63 (-0.72 to -0.54) | 0.28 (0.06 to 0.51) |
Uganda | -0.79 (-0.9 to -0.69) | 0.39 (0.32 to 0.46) | 0.12 (-0.07 to 0.31) | 0.04 (-0.16 to 0.23) |
Zambia | 0.69 (0.63 to 0.74) | 0.43 (0.39 to 0.46) | 0.23 (0.18 to 0.28) | 0.2 (0.14 to 0.26) |
Zimbabwe | 1.12 (1.08 to 1.17) | 0.51 (0.48 to 0.53) | 0.3 (0.24 to 0.35) | -0.08 (-0.13 to -0.03) |
4.3 Environmental transmission
Environmental transmission is a critical factor in cholera spread and consists of several key components: the rate at which infected individuals shed V. cholerae into the environment, the pathogen’s survival rate in environmental conditions, and the overall suitability of the environment for sustaining the bacteria over time.
4.3.1 Climate-driven transmission
To capture the impacts of climate-drivers on cholera transmission, we have included the parameter \(\psi_{jt}\), which represents the current state of environmental suitability with respect to: i) the survival time of V. cholerae in the environment and, ii) the rate of environment-to-human transmission which contributes to the overall force of infection.
\[\begin{equation} \beta_{jt}^{\text{env}} = \beta_{j0}^{\text{env}} \Bigg(1 + \frac{\psi_{jt}-\bar\psi_{jt}}{\bar\psi_{jt}} \Bigg) \quad \text{and} \quad \bar\psi_{jt} = \frac{1}{T} \sum_{t=1}^{T} \psi_{jt} \tag{4.6} \end{equation}\]
This formulation effectively scales the base environmental transmission rate \(\beta_{jt}^{\text{env}}\) so that it varies over time according to the climatically driven model of suitability. Note that, unlike the the cosine wave function of \(\beta_{jt}^{\text{hum}}\), this temporal term can increase or decrease over time following multi-annual cycles.
[Fig: Example temporal forcing of environment-to-human transmission]
Environmental suitability (\(\psi_{jt}\)) also impacts the survival rate of V. cholerae in the environment (\(\delta_{jt}\)) with the form:
\[\begin{equation} \delta_{jt} = \delta_{\text{min}} + \psi_{jt} \times (\delta_{\text{max}} - \delta_{\text{min}}) \tag{4.7} \end{equation}\]
which normalizes the variance of the suitability parameter to be bounded within the minimum (\(\delta_{\text{min}}\)) and maximum (\(\delta_{\text{max}}\)) survival times of V. cholerae.
4.3.2 Modeling suitability
The environmental suitability (\(\psi_{jt}\)) of V. cholerae is modeled as a time series for each location, using covariates that include environmental factors, past and present climate measures, severe weather events, and large-scale regional climate drivers. Most of these factors are influenced by climate change, so we will source data that projects each covariate into the future under different climate change scenarios. Environmental suitability, \(\psi_{jt}\), is generally defined as:
\[ \psi_{jt} = f(\text{temperature, precipitation, humidity, wind speed, soil moisture}) \]
The function \(f(\cdot)\) can be modeled using various approaches, including Generalized Linear Models (GLMs), Generalized Additive Models (GAMs), Boosted Regression Trees (BRTs), or machine learning methods such as Recurrent Neural Networks (RNNs) or Long Short-Term Memory Networks (LSTMs). A simpler approach might involve Bayesian variable selection using the BAS
R package. The model will be fitted to all available data, with projections of suitability for each location. Implementing a rolling-window validation across the time series would help assess model performance. The model can be directly fitted to reported case counts or converted to a binary threshold, depending on the analysis needs. The primary goal is to explain a portion of the variance in reported case counts as a proxy for environmental suitability.
Covariates will include both historical climate variables and those predicted under climate change scenarios. For example, the MRI-AGCM3-2-S and EC_Earth3P_HR models provide weather variables at ~20km resolution, including temperature, relative humidity, wind, precipitation, solar radiation, cloud cover, and soil moisture. These covariates should have time-lagged or short-term cumulative versions. The initial version of the model will likely use data from the OpenMeteo Historical Weather Data API. Additional data sources will be integrated into subsequent versions of the suitability model. For instance, flood and cyclone data will be incorporated later, though not in the initial version of the model. We will also seek data on ENSO (El Niño, Neutral, La Niña) and the Indian Ocean sea surface temperature index. Open-source projections of these variables into the near future (months to a year or two) will likely be available.
4.3.3 Shedding
The rate at which infected individuals shed V. cholerae into the environment (\(\zeta\)) is a critical factor influencing cholera transmission. Shedding rates can vary widely depending on the severity of the infection, the immune response of the individual, and environmental factors. According to Fung 2014, the shedding rate is estimated to range from 0.01 to 10 cells per mL per person per day.
Further studies support these findings, indicating that shedding rates can indeed fluctuate significantly. For instance, Nelson et al (2009) note that during the, depending on the phase of infection, individuals can shed \(10^3\) (asymptomatic cases) to \(10^12\) (severe cases) V. cholerae cells per gram of stool. Future version of the model may attempt to capture the nuances of shedding dynamics, but here we make the simplifying assumption that shedding is constant across infected individuals and has a wide range of variability with no prior distributional assumptions:
\[ \zeta \sim \text{Uniform}(0.01, 10). \]
4.3.4 WAter, Sanitation, and Hygiene (WASH)
Since V. cholerae is transmitted through fecal contamination of water and other consumables, the level of exposure to contaminated substrates significantly impacts transmission rates. Interventions involving Water, Sanitation, and Hygiene (WASH) have long been a first line of defense in reducing cholera transmission, and in this context, WASH variables can serve as proxy for the rate of contact with environmental risk factors. In the MOSAIC model, WASH variables are incorporated mechanistically, allowing for intervention scenarios that include changes to WASH. However, it is necessary to distill available WASH variables into a single parameter that represents the WASH-determined contact rate with contaminated substrates for each location \(j\), which we define as \(\theta_j\).
To parameterize \(\theta_j\), we calculated a weighted mean of the 8 WASH variables in Sikder et al 2023 and originally modeled by the Local Burden of Disease WaSH Collaborators 2020. The 8 WASH variables (listed in Table 4.3) provide population-weighted measures of the proportion of the population that either: i) have access to WASH resources (e.g., piped water, septic or sewer sanitation), or ii) are exposed to risk factors (e.g. surface water, open defecation). For risk associated WASH variables, we used the complement (\(1-\text{value}\)) to give the proportion of the population not exposed to each risk factor. We used the optim
function in R and the L-BFGS-B algorithm to estimate the set of optimal weights (Table 4.3) that maximize the correlation between the weighted mean of the 8 WASH variables and reported cholera incidence per 1000 population across 40 SSA countries from 2000 to 2016. The optimal weighted mean had a correlation coefficient of \(r =\) -0.33 (-0.51 to -0.09 95% CI) which was higher than the basic mean and all correlations provided by the individual WASH variables (see Figure 4.8). The weighted mean then provides a single variable between 0 and 1 that represents the overall proportion of the population that has access to WASH and/or is not exposed to environmental risk factors. Thus, the WASH-mediated contact rate with sources of environmental transmission is represented as (\(1-\theta_j\)) in the environment-to-human force of infection (\(\Psi_{jt}\)). Values of \(\theta_j\) for all countries are shown in Figure 4.9.
WASH variable | Optimized weight |
---|---|
Piped Water | 0.356 |
Septic or Sewer Sanitation | 0.014 |
Other Improved Water | 0.000 |
Other Improved Sanitation | 0.000 |
Surface Water | 0.504 |
Unimproved Sanitation | 0.000 |
Unimproved Water | 0.000 |
Open Defecation | 0.126 |
4.4 Immune dynamics
4.4.1 Immunity from vaccination
The impacts of Oral Cholera Vaccine (OCV) campaigns is incorporated into the model through the Vaccinated compartment (V). The rate that individuals are effectively vaccinated is defined as \(\phi\nu_tS_{jt}\), where \(S_{jt}\) are the available number of susceptible individuals in location \(j\) at time \(t\), \(\nu_t\) is the number of OCV doses administered at time \(t\) and \(\phi\) is the estimated vaccine effectiveness. Note that there is just one vaccinated compartment at this time, though future model versions may include \(V_1\) an \(V_2\) compartments to explore two dose vaccination strategies or to emulate more complex waning patterns.
The vaccination rate \(\nu_t\) is not an estimated quantity. Rather, it is directly defined by the reported number of OCV doses administered on the WHO OCV dashboard here: https://www.who.int/groups/icg/cholera.
\[ \nu_t := \text{Reported rate of OCV administration} \]
The evidence for waning immunity comes from 4 cohort studies (Table 4.5) from Bangladesh (Qadri et al 2016 and 2018), South Sudan (Azman et al 2016), and Democratic Republic of Congo (Malembaka et al 2024).
Effectiveness | Upper CI | Lower CI | Day (midpoint) | Day (min) | Day (max) | Source |
---|---|---|---|---|---|---|
0.873 | 0.990 | 0.702 | 60.0 | NA | NA | Azman et al (2016) |
0.400 | 0.600 | 0.110 | 93.5 | 7 | 180 | Qadri et al (2016) |
0.390 | 0.520 | 0.230 | 368.5 | 7 | 730 | Qadri et al (2018) |
0.527 | 0.674 | 0.314 | 435.0 | 360 | 510 | Malembaka et al (2024) |
0.447 | 0.594 | 0.248 | 900.0 | 720 | 1080 | Malembaka et al (2024) |
We estimated vaccine effectiveness and waning immunity by fitting an exponential decay model to the reported effectiveness of one dose OCV in these studies using the following formulation:
\[\begin{equation} \text{Proportion immune}\ t \ \text{days after vaccination} = \phi \times (1 - \omega) ^ {t-t_{\text{vaccination}}} \tag{4.8} \end{equation}\]
Where \(\phi\) is the effectiveness of one dose OCV, and the based on this specification, it is also the initial proportion immune directly after vaccination. The decay rate parameter \(\omega\) is the rate at which initial vaccine derived immunity decays per day post vaccination, and \(t\) and \(t_{\text{vaccination}}\) are the time (in days) the function is evaluated at and the time of vaccination respectively. When we fitted the model to the data from the cohort studies shown in Table (4.5) we found that \(\omega = 0.00057\) (\(0-0.0019\) 95% CI), which gives a mean estimate of 4.8 years for vaccine derived immune duration with unreasonably large confidence intervals (1.4 years to infinite immunity). However, the point estimate of 4.8 years is consistent with anecdotes that one dose OCV is effective for up to at least 3 years.
The wide confidence intervals are likely due to the wide range of reported estimates for proportion immune after a short duration in the 7–90 days range (Azman et al 2016 and Qadri et al 2016). Therefore, we chose to use the point estimate of \(\omega\) and incorporate uncertainty based on the initial proportion immune (i.e. vaccine effectiveness \(\phi\)) shortly after vaccination. Using the decay model in Equation (4.8) we estimated \(\phi\) to be \(0.64\) (\(0.32-0.96\) 95% CI). We then fit a Beta distribution to the quantiles of \(\phi\) by minimizing the sums of squares using the Nelder-Mead optimization algorithm to render the following distribution (shown in Figure 4.10B):
\[\begin{equation} \phi \sim \text{Beta}(4.57, 2.41). \tag{4.9} \end{equation}\]
4.4.2 Immunity from natural infection
The duration of immunity after a natural infection is likely to be longer lasting than that from vaccination with OCV (especially given the current one dose strategy). As in most SIR-type models, the rate at which individuals leave the Recovered compartment is governed by the immune decay parameter \(\varepsilon\). We estimated the durability of immunity from natural infection based on two cohort studies and fit the following exponential decay model to estimate the rate of immunity decay over time:
\[ \text{Proportion immune}\ t \ \text{days after infection} = 0.99 \times (1 - \varepsilon) ^ {t-t_{\text{infection}}} \] Where we make the necessary and simplifying assumption that within 0–90 days after natural infection with V. cholerae, individuals are 95–99% immune. We fit this model to reported data from Ali et al (2011) and Clemens et al (1991) (see Table 4.6).
Day | Effectiveness | Upper CI | Lower CI | Source |
---|---|---|---|---|
90 | 0.95 | 0.95 | 0.95 | Assumption |
1080 | 0.65 | 0.81 | 0.37 | Ali et al (2011) |
1260 | 0.61 | 0.81 | 0.21 | Clemens et al (1991) |
We estimated the mean immune decay to be \(\bar\varepsilon = 3.9 \times 10^{-4}\) (\(1.7 \times 10^{-4}-1.03 \times 10^{-3}\) 95% CI) which is equivalent to an immune duration of \(7.21\) years (\(2.66-16.1\) years 95% CI) as shown in Figure 4.11A. This is slightly longer than previous modeling work estimating the duration of immunity to be ~5 years (King et al 2008). Uncertainty around \(\varepsilon\) in the model is then represented by a Log-Normal distribution as shown in Figure 4.11B:
\[ \varepsilon \sim \text{Lognormal}(\bar\varepsilon+\frac{\sigma^2}{2}, 0.25) \]
4.5 Spatial dynamics
The parameters in the model diagram in Figure 4.2 that have a \(jt\) subscript denote the spatial structure of the model. Each country is modeled as an independent metapopulation that is connected to all others via the spatial force of infection \(\Lambda_{jt}\) which moves contagion among metapopulations according to the connectivity provided by parameters \(\tau_i\) (the probability departure) and \(\pi_{ij}\) (the probability of diffusion to destination \(j\)). Both parameters are estimated using the departure-diffusion model below which is fitted to average weekly air traffic volume between all of the 41 countries included in the MOSAIC framework (Figure 4.12).
4.5.1 Human mobility model
The departure-diffusion model estimates diagonal and off-diagonal elements in the mobility matrix (\(M\)) separately and combines them using conditional probability rules. The model first estimates the probability of travel outside the origin location \(i\)—the departure process—and then the distribution of travel from the origin location \(i\) by normalizing connectivity values across all \(j\) destinations—the diffusion process. The values of \(\pi_{ij}\) sum to unity along each row, but the diagonal is not included, indicating that this is a relative quantity. That is to say, \(\pi_{ij}\) gives the probability of going from \(i\) to \(j\) given that travel outside origin \(i\) occurs. Therefore, we can use basic conditional probability rules to define the travel routes in the diagonal elements (trips made within the origin \(i\)) as \[ \Pr( \neg \text{depart}_i ) = 1 - \tau_i \] and the off-diagonal elements (trips made outside origin \(i\)) as \[ \Pr( \text{depart}_i, \text{diffuse}_{i \rightarrow j}) = \Pr( \text{diffuse}_{i \rightarrow j} \mid \text{depart}_i ) \Pr(\text{depart}_i ) = \pi_{ij} \tau_i. \] The expected mean number of trips for route \(i \rightarrow j\) is then:
\[\begin{equation} M_{ij} = \begin{cases} \theta N_i (1-\tau_i) \ & \text{if} \ i = j \\ \theta N_i \tau_i \pi_{ij} \ & \text{if} \ i \ne j. \end{cases} \tag{4.10} \end{equation}\]
Where, \(\theta\) is a proportionality constant representing the overall number of trips per person in an origin population of size \(N_i\), \(\tau_i\) is the probability of leaving origin \(i\), and \(\pi_{ij}\) is the probability of travel to destination \(j\) given that travel outside origin \(i\) occurs.
4.5.2 Estimating the departure process
The probability of travel outside the origin is estimated for each location \(i\) to give the location-specific departure probability \(\tau_i\). \[ \tau_i \sim \text{Beta}(1+s, 1+r) \] Binomial probabilities for each origin \(\tau_i\) are drawn from a Beta distributed prior with shape (\(s\)) and rate (\(r\)) parameters. \[ \begin{aligned} s &\sim \text{Gamma}(0.01, 0.01)\\ r &\sim \text{Gamma}(0.01, 0.01) \end{aligned} \]
4.5.3 Estimating the diffusion process
We use a normalized formulation of the power law gravity model to defined the diffusion process, the probability of travelling to destination \(j\) given travel outside origin \(i\) (\(\pi_{ij}\)) which is defined as:
\[\begin{equation} \pi_{ij} = \frac{ N_j^\omega d_{ij}^{-\gamma} }{ \sum\limits_{\forall j \ne i} N_j^\omega d_{ij}^{-\gamma} } \tag{4.11} \end{equation}\]
Where, \(\omega\) scales the attractive force of each \(j\) destination based on its population size \(N_j\). The kernel function \(d_{ij}^{-\gamma}\) serves as a penalty on the proportion of travel from \(i\) to \(j\) based on distance. Prior distributions of diffusion model parameters are defined as: \[ \begin{aligned} \omega &\sim \text{Gamma}(1, 1)\\ \gamma &\sim \text{Gamma}(1, 1) \end{aligned} \]
The models for \(\tau_i\) and \(\pi_{ij}\) were fitted to air traffic data from OAG using the mobility
R package (Giles 2020). Estimates for mobility model parameters are shown in Figures 4.13 and 4.14.
4.5.4 The probability of spatial transmission
The likelihood of introductions of cholera from disparate locations is a major concern during cholera outbreaks. However, this can be difficult to characterize given the endemic dynamics and patterns of human movement. We include a few measures of spatial heterogeneity here and the first is a simple importation probability based on connectivity and the possibility of incoming infections. The basic probability of transmission from an origin \(i\) to a particular destination \(j\) and time \(t\) is defined as:
\[\begin{equation} p(i,j,t) = 1 - e^{-\beta_{jt}^{\text{hum}} (((1-\tau_j)S_{jt})/N_{jt}) \pi_{ij}\tau_iI_{it}} \tag{4.12} \end{equation}\]
4.5.5 The spatial hazard
Although we are more concerned with endemic dynamics here, there are likely to be periods of time early in the rainy season where cholera cases and the rate of transmission is low enough for spatial spread to resemble epidemic dynamics for a time. During such times periods, we can estimate the arrival time of contagion for any location where cases are yet to be reported. We do this be estimating the spatial hazard of transmission:
\[\begin{equation} h(j,t) = \frac{ \beta_{jt}^{\text{hum}} \Big(1 - \exp\big(-((1-\tau_j)S_{jt}/N_{jt}) \sum_{\forall i \not= j} \pi_{ij}\tau_i (I_{it}/N_{it}) \big) \Big) }{ 1/\big(1 + \beta_{jt}^{\text{hum}} (1-\tau_j)S_{jt}\big) }. \tag{4.13} \end{equation}\]
And then normalizing to give the waiting time distribution for all locations:
\[\begin{equation} w(j,t) = h(j,T) \prod_{t=1}^{T-1}1-h(j,t). \tag{4.14} \end{equation}\]
4.5.6 Coupling among locations
Another measure of spatial heterogeneity is to quantify the coupling of disease dynamics among metapopulations using a correlation coefficient. Here, we use the definition of spatial correlation between locations \(i\) and \(j\) as \(C_{ij}\) described in Keeling and Rohani (2002), which gives a measure of how similar infection dynamics are between locations.
\[\begin{equation} C_{ij} = \frac{ ( y_{it} - \bar{y}_i )( y_{jt} - \bar{y}_j ) }{ \sqrt{\text{var}(y_i) \text{var}(y_j)} } \tag{4.15} \end{equation}\] Where \(y_{it} = I_{it}/N_i\) and \(y_{jt} = I_{jt}/N_j\). Mean prevalence in each location is \(\bar{y_i} = \frac{1}{T} \sum_{t=1}^{T} y_{it}\) and \(\bar{y_j} = \frac{1}{T} \sum_{t=1}^{T} y_{jt}\).
4.6 The observation process
4.6.1 Rate of symptomatic infection
The presentation of infection with V. cholerae can be extremely variable. The severity of infection depends many factors such as the amount of the infectious dose, the age of the host, the level of immunity of the host either through vaccination or previous infection, and naivety to the particular strain of V. cholerae. Additional circumstantial factors such as nutritional status and overall pathogen burden may also impact infection severity. At the population level, the observed proportion of infections that are symptomatic is also dependent on the endemicity of cholera in the region. Highly endemic areas (e.g. parts of Bangladesh; Hegde et al 2024) may have a very low proportion of symptomatic infections due to many previous exposures. Inversely, populations that are largely naive to V. cholerae will exhibit a relatively higher proportion of symptomatic infections (e.g. Haiti; Finger et al 2024).
Accounting for all of these nuances in the first version of this model not possible, but we can past studies do contain some information that can help to set some sensible bounds on our definition for the proportion of infections that are symptomatic (\(\sigma\)). So we have compiled a short list of studies that have done sero-surveys and cohort studies to assess the likelihood of symptomatic infections in different locations and displayed those results in Table (4.7).
To provide a reasonably informed prior for the proportion of infections that are symptomatic, we calculated the combine mean and confidence intervals of all studies in Table 4.7 and fit a Beta distribution that corresponds to these quantiles using least-squares and a Nelder-Mead algorithm. The resulting prior distribution for the symptomatic proportion \(\sigma\) is:
\[\begin{equation} \sigma \sim \text{Beta}(4.30, 13.51) \end{equation}\]
Mean | Low CI | High CI | Location | Source | Note |
---|---|---|---|---|---|
0.570 | NA | NA | NA | Nelson et al (2009) | Review |
NA | 1.000 | 0.250 | NA | Lueng & Matrajt (2021) | Review |
NA | 0.600 | 0.200 | Endemic regions | Harris et al (2012) | Review |
0.238 | 0.250 | 0.227 | Haiti | Finger et al (2024) | Sero-survey and clinical data |
0.213 | 0.231 | 0.194 | Haiti | Jackson et al (2013) | Cross-sectional sero-survey |
0.204 | NA | NA | Pakistan | Bart et al (1970) | Sero-survey during epidemic; El Tor Ogawa strain |
0.371 | NA | NA | Pakistan | Bart et al (1970) | Sero-survey during epidemic; Inaba strain |
0.184 | 0.256 | 0.112 | Bangladesh | Harris et al (2008) | Household cohort; mean of all age groups |
0.001 | 0.000 | 0.001 | Bangladesh | Hegde et al (2024) | Sero-survey and clinical data |
The prior distribution for \(\sigma\) is plotted in Figure 4.15A with the reported values of the proportion symptomatic from previous studies shown in 4.15B.
4.6.2 Suspected cases
The clinical presentation of diarrheal diseases is often similar across various pathogens, which can lead to systematic biases in the reported number of cholera cases. It is anticipated that the number of suspected cholera cases is related to the actual number of infections by a factor of \(1/\rho\), where \(\rho\) represents the proportion of suspected cases that are true infections. To adjust for this bias, we use estimates from the meta-analysis by Weins et al. (2023), which suggests that suspected cholera cases outnumber true infections by approximately 2 to 1, with a mean across studies indicating that 52% (24-80% 95% CI) of suspected cases are actual cholera infections. A higher estimate was reported for ourbreak settings (78%, 40-99% 95% CI). To account for the variability in this estimate, we fit a Beta distribution to the reported quantiles using a least squares approach and the Nelder-Mead algorithm, resulting in the prior distribution shown in Figure 4.16B:
\[\begin{equation} \rho \sim \text{Beta}(4.79, 1.53). \end{equation}\]
4.6.3 Case fatality rate
The Case Fatality Rate (CFR) among symptomatic infections was calculated using reported cases and deaths data from January 2021 to August 2024. The data were collated from various issues of the WHO Weekly Epidemiological Record the Global Cholera and Acute Watery Diarrhea (AWD) Dashboard (see Data section) which provide annual aggregations of reported cholera cases and deaths. We then used the Binomial exact test (binom.test
in R) to calculate the mean probability for the number of deaths (successes) given the number of reported cases (sample size), and the Clopper-Pearson method for calculating the binomial confidence intervals. We then fit Beta distributions to the mean CFR and 95% confidence intervals calculated for each country using least squares and the Nelder-Mead algorithm to give the distributional uncertainty around the CFR estimate for each country (\(\mu_j\)).
\[ \mu_j \sim \text{Beta}(s_{1,j}, s_{2,j}) \]
Where \(s_{1,i}\) and \(s_{2,j}\) are the two positive shape parameters of the Beta distribution estimated for destination \(j\). By definition \(\mu_j\) is the CFR for reported cases which are a subset of the total number of infections. Therefore, to infer the total number of deaths attributable to cholera infection, we assume that the CFR of observed cases is proportionally equivalent to the CFR of all cases and then calculate total deaths \(D\) as follows:
\[\begin{equation} \begin{aligned} \text{CFR}_{\text{observed}} &= \text{CFR}_{\text{total}}\\ \\[3pt] \frac{[\text{observed deaths}]}{[\text{observed cases}]} &= \frac{[\text{total deaths}]}{[\text{all infections}]}\\ \\[3pt] \text{total deaths} &= \frac{[\text{observed deaths}] \times [\text{true infections}]}{[\text{observed cases}]}\\ \\[3pt] D_{jt} &= \frac{ [\sigma\rho\mu_j I_{jt}] \times [I_{jt}] }{ [\sigma\rho I_{jt}] } \end{aligned} \end{equation}\]
Country | Cases (2014-2024) | Deaths (2014-2024) | CFR | CFR Lower | CFR Upper | Beta Shape1 | Beta Shape2 |
---|---|---|---|---|---|---|---|
AFRO Region | 563496 | 11203 | 0.020 | 0.020 | 0.020 | 0.008 | 1.911 |
Benin | 1731 | 25 | 0.014 | 0.009 | 0.021 | 0.009 | 1.918 |
Burundi | 2121 | 12 | 0.006 | 0.003 | 0.010 | 0.007 | 1.926 |
Cameroon | 21685 | 511 | 0.024 | 0.022 | 0.026 | 0.009 | 1.905 |
Comoros | 10342 | 149 | 0.014 | 0.012 | 0.017 | 0.008 | 1.913 |
Democratic Republic of Congo | 103998 | 1218 | 0.012 | 0.011 | 0.012 | 0.007 | 1.921 |
Ethiopia | 54872 | 652 | 0.012 | 0.011 | 0.013 | 0.007 | 1.921 |
Kenya | 12921 | 213 | 0.016 | 0.014 | 0.019 | 0.008 | 1.915 |
Liberia | 534 | 0 | 0.020 | 0.020 | 0.020 | 0.008 | 1.911 |
Malawi | 58961 | 1756 | 0.030 | 0.028 | 0.031 | 0.010 | 1.900 |
Mozambique | 57490 | 217 | 0.004 | 0.003 | 0.004 | 0.006 | 1.887 |
Niger | 5591 | 166 | 0.030 | 0.025 | 0.034 | 0.010 | 1.892 |
Nigeria | 145012 | 4514 | 0.031 | 0.030 | 0.032 | 0.010 | 1.901 |
Somalia | 19995 | 120 | 0.006 | 0.005 | 0.007 | 0.006 | 1.906 |
South Africa | 1403 | 47 | 0.033 | 0.025 | 0.044 | 0.012 | 2.008 |
South Sudan | 1895 | 3 | 0.002 | 0.000 | 0.005 | 0.006 | 1.894 |
Tanzania | 5453 | 96 | 0.018 | 0.014 | 0.021 | 0.009 | 1.914 |
Uganda | 342 | 15 | 0.044 | 0.025 | 0.071 | 0.014 | 1.863 |
Zambia | 24348 | 754 | 0.031 | 0.029 | 0.033 | 0.010 | 1.923 |
Zimbabwe | 34554 | 720 | 0.021 | 0.019 | 0.022 | 0.009 | 1.906 |
4.7 Demographics
The model includes basic demographic change by using reported birth and death rates for each of the \(j\) countries, \(b_j\) and \(d_j\) respectively. These rates are static and defined by the United Nations Department of Economic and Social Affairs Population Division World Population Prospects 2024. Values for \(b_j\) and \(d_j\) are derived from crude rates and converted to birth rate per day and death rate per day (shown in Table 4.10).
Country | Population | Birth rate | Death rate |
---|---|---|---|
Algeria | 45831343 | 0.0000542 | 1.28e-05 |
Angola | 36186956 | 0.0001046 | 1.93e-05 |
Benin | 13934166 | 0.0000940 | 2.44e-05 |
Botswana | 2459937 | 0.0000683 | 1.58e-05 |
Burkina Faso | 22765636 | 0.0000877 | 2.21e-05 |
Burundi | 13503998 | 0.0000935 | 1.87e-05 |
Cameroon | 27997833 | 0.0000937 | 1.99e-05 |
Cape Verde | 521047 | 0.0000339 | 1.39e-05 |
Central African Republic | 5064592 | 0.0001292 | 2.63e-05 |
Chad | 18767684 | 0.0001196 | 3.11e-05 |
Comoros | 842267 | 0.0000793 | 1.99e-05 |
Congo | 6108142 | 0.0000849 | 1.74e-05 |
Côte d’Ivoire | 30783520 | 0.0000887 | 2.12e-05 |
Democratic Republic of Congo | 104063312 | 0.0001150 | 2.37e-05 |
Equatorial Guinea | 1825480 | 0.0000821 | 2.18e-05 |
Eritrea | 3438999 | 0.0000789 | 1.67e-05 |
Eswatini | 1224706 | 0.0000663 | 2.12e-05 |
Ethiopia | 127028360 | 0.0000886 | 1.65e-05 |
Gabon | 2457715 | 0.0000766 | 1.74e-05 |
Gambia | 2666786 | 0.0000843 | 1.74e-05 |
Ghana | 33467371 | 0.0000728 | 1.95e-05 |
Guinea | 14229395 | 0.0000939 | 2.53e-05 |
Guinea-Bissau | 2129290 | 0.0000832 | 1.95e-05 |
Kenya | 54793511 | 0.0000750 | 2.00e-05 |
Lesotho | 2298496 | 0.0000664 | 2.93e-05 |
Liberia | 5432670 | 0.0000858 | 2.24e-05 |
Madagascar | 30813475 | 0.0000890 | 2.09e-05 |
Malawi | 20832833 | 0.0000871 | 1.49e-05 |
Mali | 23415909 | 0.0001113 | 2.40e-05 |
Mauritania | 4948362 | 0.0000957 | 1.54e-05 |
Mauritius | 1274659 | 0.0000254 | 2.39e-05 |
Mozambique | 33140626 | 0.0001042 | 1.95e-05 |
Namibia | 2928037 | 0.0000718 | 1.71e-05 |
Niger | 25727295 | 0.0001167 | 2.47e-05 |
Nigeria | 225494749 | 0.0000912 | 3.25e-05 |
Rwanda | 13802596 | 0.0000785 | 1.64e-05 |
São Tomé & Príncipe | 228558 | 0.0000780 | 1.54e-05 |
Senegal | 17867073 | 0.0000816 | 1.55e-05 |
Seychelles | 126694 | 0.0000377 | 2.27e-05 |
Sierra Leone | 8368119 | 0.0000848 | 2.30e-05 |
Somalia | 18031404 | 0.0001198 | 2.74e-05 |
South Africa | 62796883 | 0.0000518 | 2.55e-05 |
South Sudan | 11146895 | 0.0000807 | 2.71e-05 |
Tanzania | 65657004 | 0.0000979 | 1.61e-05 |
Togo | 9196283 | 0.0000863 | 2.13e-05 |
Uganda | 47981110 | 0.0000978 | 1.35e-05 |
Zambia | 20430382 | 0.0000919 | 1.45e-05 |
Zimbabwe | 16203259 | 0.0000840 | 2.10e-05 |
4.8 The reproductive number
The reproductive number is a common metric of epidemic growth that represents the average number of secondary cases generated by a primary case at a specific time during an epidemic. We track how \(R\) changes over time by estimating the instantaneous reproductive number \(R_t\) as described in Cori et al 2013. We track \(R_t\) across all metapopulations in the model to give \(R_{jt}\) using the following formula:
\[\begin{equation} R_{jt} = \frac{I_{jt}}{\sum_{\Delta t=1}^{t} g(\Delta t) I_{j,t-\Delta t}} \tag{4.16} \end{equation}\]
Where \(I_{jt}\) is the number of new infections in destination \(j\) at time \(t\), and \(g(\Delta t)\) represents the probability value from the generation time distribution of cholera. This is accomplished by using the weighed sum in the denominator which is highly influenced by the generation time distribution.
4.8.1 The generation time distribution
The generation time distribution gives the time between when an individual is infected and when they infect subsequent individuals. We parameterized this quantity using a Gamma distribution with a mean of 5 days:
\[\begin{equation} g(\cdot) \sim \text{Gamma}(0.5, 0.1). \tag{4.17} \end{equation}\]
Here, shape=0.5, rate=0.1, and the mean if given by shape/rate. Previous studies use a mean of 5 days (Kahn et al 2020 and Azman 2016), however a mean of 3, 5, 7, or 10 days may be admissible (Azman 2012).
Interval | Week | Probability |
---|---|---|
[0,7] | 1 | 0.704 |
(7,14] | 2 | 0.178 |
(14,21] | 3 | 0.068 |
(21,28] | 4 | 0.028 |
(28,35] | 5 | 0.012 |
(35,42] | 6 | 0.006 |
(42,49] | 7 | 0.003 |
(49,56] | 8 | 0.001 |
4.9 Initial conditions
Since this first version of the model will begin on Jan 2023 (to take advantage of available weekly data), the initial conditions surrounding population immunity must be estimated. To set these initial conditions, we use historical data to find the total number of reported cases for a location over the previous X years, multiply by \(1/\sigma\) to estimate total infections from those symptomatic cases that are reported, and then adjust based on waning immunity. We also sum the total number of vaccinations over the past X years and adjust for vaccine efficacy \(\phi\) and waning immunity from vaccination \(\omega\).
total number infected? From reported cases… back out symptomatic and asymptomatic
Total number immune due to natural infections in the past X years
total number immune due to past vaccinations in the X years
Use deconvolution based on immune decay estimated in vaccine section
4.10 Model calibration
The model will be calibrated using Latin hypercube sampling for hyper-parameters and model likelihoods fit to incidence and deaths.
An important challenge is flexibly fitting to data that are often missing or only available in aggregated forms.
[Fig: different spatial and temporal scales of available data]
4.11 Caveats
- Simplest model to start. Easier for initial spatial structure but with minimum additional compartments to calibrate to available data (vaccination, cases, deaths).
- Country level aggregations. First generation data is 2023/24…
- Assumes vaccinating susceptible only individuals.
- For climate, summarizing for whole country.
4.12 Table of parameters
Parameter | Description | Distribution | Source |
---|---|---|---|
\(i\) | Index \(i\) represents the origin metapopulation. | ||
\(j\) | Index \(j\) represents the destination metapopulation. | ||
\(t\) | Index \(t\) is the time step which is one week (7 days). | ||
\(b_j\) | Birth rate of population \(j\). | ||
\(d_j\) | Overall mortality rate of population \(j\). | ||
\(N_{jt}\) | Total population size of destination \(j\) at time \(t\). | ||
\(S_{jt}\) | Number of susceptible individuals in destination \(j\) at time \(t\). | ||
\(V_{jt}\) | Number of effectively vaccinated individuals in destination \(j\) at time \(t\). | ||
\(I_{jt}\) | Number of infected individuals in destination \(j\) at time \(t\). | ||
\(W_{jt}\) | Total amount of V. cholerae in the environment in destination \(j\) at time \(t\). | ||
\(R_{jt}\) | Number of recovered (and therefore immune) individuals in destination \(j\) at time \(t\). | ||
\(\Lambda_{j,t+1}\) | The force of infection due to human-to-human transmission in destination \(j\) at time step \(t+1\). | ||
\(\Psi_{j,t+1}\) | The force of infection due to environment-to-human transmission in destination \(j\) at time step \(t+1\). | ||
\(\phi\) | The effectiveness of Oral Cholera Vaccine (OCV). | ||
\(\nu_{jt}\) | The reported rate of vaccination with OCV in destination \(j\) at time \(t\). | ||
\(\omega\) | Rate of waning immunity of vaccinated individuals. | ||
\(\varepsilon\) | Rate of waning immunity of recovered individuals. | ||
\(\gamma\) | Recovery rate of infected individuals. | ||
\(\mu\) | Mortality rate due to infection with V. cholerae. | ||
\(\sigma\) | Proportion of V. cholerae infections that are symptomatic. | ||
\(\rho\) | The proportion of suspected cholera cases that are true infections. | ||
\(\zeta\) | Rate that infected individuals shed V. cholerae into the environment. | \(0.1\mbox{-}10\) | |
\(\delta\) | The environmental suitability dependent decay rate of V. cholerae in the environment. | ||
\(\delta_{\text{min}}\) | The minimum decay rate of V. cholerae in the environment obtained when environmental suitability (\(\psi_{jt}\)) = 0. | ||
\(\delta_{\text{max}}\) | The maximum decay rate of V. cholerae in the environment obtained when environmental suitability (\(\psi_{jt}\)) = 1. | ||
\(\psi_{jt}\) | The climatically driven environmental suitability of V. cholerae in destination \(j\) at time \(t\). | ||
\(\beta_{j0}^{\text{hum}}\) | The baseline rate of human-to-human transmission in destination \(j\). | ||
\(\beta_{jt}^{\text{hum}}\) | The seasonal rate of human-to-human transmission in destination \(j\) at time \(t\). | ||
\(\beta_{j0}^{\text{env}}\) | The baseline rate of environment-to-human transmission in destination \(j\). | ||
\(\beta_{jt}^{\text{env}}\) | The rate of environment-to-human transmission in destination \(j\) at time \(t\). | ||
\(\alpha\) | The overall population mixing parameter. | 0.95 | Glass et al 2003 |
\(\tau_i\) | The probability that an individual departs origin \(i\). | ||
\(\pi_{ij}\) | The probability that an individual travels from origin \(i\) to destination \(j\) given that they have departed origin \(i\). | ||
\(\theta_{j}\) | The proportion of the population that have adequate Water, Sanitation and Hygiene (WASH). | ||
\(\kappa\) | The concentration (number of cells per mL) of V. cholerae required for a 50% probability of infection. | \(10^5\mbox{-}10^6\) | Fung 2014 |