| Title: | Serial Interval and Case Reproduction Number Estimation |
|---|---|
| Description: | Provides methods to estimate serial intervals and time-varying case reproduction numbers from infectious disease outbreak data. Serial intervals measure the time between symptom onset in linked transmission pairs, while case reproduction numbers quantify how many secondary cases each infected individual generates over time. These parameters are essential for understanding transmission dynamics, evaluating control measures, and informing public health responses. The package implements the maximum likelihood framework from Vink et al. (2014) <doi:10.1093/aje/kwu209> for serial interval estimation and the retrospective method from Wallinga & Lipsitch (2007) <doi:10.1098/rspb.2006.3754> for reproduction number estimation. Originally developed for scabies transmission analysis but applicable to other infectious diseases including influenza, COVID-19, and emerging pathogens. Designed for epidemiologists, public health researchers, and infectious disease modelers working with outbreak surveillance data. |
| Authors: | Kylie Ainslie [aut, cre, cph]
|
| Maintainer: | Kylie Ainslie <[email protected]> |
| License: | EUPL-1.2 |
| Version: | 0.4.0 |
| Built: | 2026-06-04 23:14:14 UTC |
| Source: | https://github.com/kylieainslie/mitey |
Fits the serial interval mixture model for each number of transmission routes
from 2 up to n_routes_max, computes model selection criteria (AIC and BIC)
for each fit, and prints a summary table to help identify the optimal number
of routes.
compare_n_routes(dat, n_routes_max = 8L, dist = "normal", n = 50, ...)compare_n_routes(dat, n_routes_max = 8L, dist = "normal", n = 50, ...)
dat |
numeric vector; index case-to-case (ICC) intervals in days. |
n_routes_max |
integer; maximum number of transmission routes to evaluate.
Models are fitted for |
dist |
character; assumed parametric family for the serial interval
distribution. Must be either |
n |
integer; number of EM algorithm iterations passed to
|
... |
additional arguments passed to |
For each value of n_routes in 2:n_routes_max, the function:
Calls si_estim to fit the mixture model and retrieve
the log-likelihood and estimated weights.
Counts the number of free parameters :
Normal distribution:
Gamma distribution:
Computes AIC and BIC:
Prints the fitted mixture plot via plot_si_fit_result.
The model with the lowest BIC (or AIC) is reported as the best-fitting number of routes.
A data frame (returned invisibly) with one row per value of
n_routes and the following columns:
Number of transmission routes fitted.
Estimated mean of the serial interval (days).
Estimated standard deviation of the serial interval (days).
Log-likelihood of the fitted model.
Akaike Information Criterion.
Bayesian Information Criterion.
Number of free parameters in the model.
Logical; whether the EM algorithm converged.
Number of EM iterations performed.
A summary table and the best n_routes according to BIC and AIC are
printed as side effects.
set.seed(42) icc_data <- c( abs(rnorm(30, mean = 0, sd = 2)), rnorm(80, mean = 12, sd = 3), rnorm(30, mean = 24, sd = 4) ) icc_data <- round(pmax(icc_data, 0)) # Compare 2 to 6 routes with normal distribution results <- compare_n_routes(icc_data, n_routes_max = 6, dist = "normal", n = 50) # Compare using gamma distribution results_gam <- compare_n_routes(icc_data, n_routes_max = 5, dist = "gamma", n = 50)set.seed(42) icc_data <- c( abs(rnorm(30, mean = 0, sd = 2)), rnorm(80, mean = 12, sd = 3), rnorm(30, mean = 24, sd = 4) ) icc_data <- round(pmax(icc_data, 0)) # Compare 2 to 6 routes with normal distribution results <- compare_n_routes(icc_data, n_routes_max = 6, dist = "normal", n = 50) # Compare using gamma distribution results_gam <- compare_n_routes(icc_data, n_routes_max = 5, dist = "gamma", n = 50)
Simulates epidemic incidence data with known reproduction numbers using the renewal equation framework. This function is useful for testing and validating reproduction number estimation methods, as it generates synthetic outbreaks with ground truth R values that can be compared against estimated values.
generate_synthetic_epidemic( true_r, si_mean, si_sd, si_dist = "gamma", initial_cases = 10 )generate_synthetic_epidemic( true_r, si_mean, si_sd, si_dist = "gamma", initial_cases = 10 )
true_r |
numeric vector; the true time-varying reproduction numbers. The length of this vector determines the number of days in the simulated epidemic |
si_mean |
numeric; the mean of the serial interval distribution in days |
si_sd |
numeric; the standard deviation of the serial interval distribution in days |
si_dist |
character; the distribution family for the serial interval. Must be either "gamma" (default) or "normal". Gamma is recommended as it naturally restricts to positive values |
initial_cases |
integer; the number of cases on the first day of the epidemic. Defaults to 10 |
The function implements the discrete renewal equation:
where is the expected number of new infections at time ,
is the incidence at time , is the reproduction number
at time , and is the probability mass function of the serial
interval distribution for interval .
New cases at each time point are drawn from a Poisson distribution with mean
, introducing realistic stochastic variation while maintaining
the specified reproduction number trajectory.
The serial interval distribution is truncated at the 99th percentile to avoid computationally expensive calculations for very long tails. For the normal distribution, the probability mass function is normalized to ensure proper probability weights.
This function is particularly useful for:
Validating reproduction number estimation methods
Testing the performance of epidemiological models
Generating realistic epidemic scenarios for research
Creating training data for machine learning approaches
A data frame with three columns:
date: Date sequence starting from "2023-01-01"
true_r: The input reproduction number values
incidence: Simulated daily case counts
Fraser C (2007). Estimating individual and household reproduction numbers in an emerging epidemic. PLoS One, 2(8), e758.
Cori A, Ferguson NM, Fraser C, Cauchemez S (2013). A new framework and software to estimate time-varying reproduction numbers during epidemics. American Journal of Epidemiology, 178(9), 1505-1512.
wallinga_lipsitch for reproduction
number estimation methods that can be applied to the generated data
# Simple epidemic with constant R = 1.5 constant_r <- rep(1.5, 30) epidemic1 <- generate_synthetic_epidemic( true_r = constant_r, si_mean = 7, si_sd = 3, si_dist = "gamma" ) head(epidemic1) # Epidemic with declining R (e.g., intervention effect) declining_r <- seq(2.0, 0.5, length.out = 50) epidemic2 <- generate_synthetic_epidemic( true_r = declining_r, si_mean = 5, si_sd = 2, si_dist = "gamma", initial_cases = 5 ) # Epidemic with seasonal pattern days <- 100 seasonal_r <- 1.2 + 0.5 * sin(2 * pi * (1:days) / 365 * 7) # Weekly seasonality epidemic3 <- generate_synthetic_epidemic( true_r = seasonal_r, si_mean = 6, si_sd = 2.5, si_dist = "normal" ) # Plot the results if (require(ggplot2)) { library(ggplot2) ggplot(epidemic1, aes(x = date)) + geom_col(aes(y = incidence), alpha = 0.7) + geom_line(aes(y = true_r * 10), color = "red") + labs(title = "Synthetic Epidemic", y = "Daily Incidence", subtitle = "Red line: True R × 10") }# Simple epidemic with constant R = 1.5 constant_r <- rep(1.5, 30) epidemic1 <- generate_synthetic_epidemic( true_r = constant_r, si_mean = 7, si_sd = 3, si_dist = "gamma" ) head(epidemic1) # Epidemic with declining R (e.g., intervention effect) declining_r <- seq(2.0, 0.5, length.out = 50) epidemic2 <- generate_synthetic_epidemic( true_r = declining_r, si_mean = 5, si_sd = 2, si_dist = "gamma", initial_cases = 5 ) # Epidemic with seasonal pattern days <- 100 seasonal_r <- 1.2 + 0.5 * sin(2 * pi * (1:days) / 365 * 7) # Weekly seasonality epidemic3 <- generate_synthetic_epidemic( true_r = seasonal_r, si_mean = 6, si_sd = 2.5, si_dist = "normal" ) # Plot the results if (require(ggplot2)) { library(ggplot2) ggplot(epidemic1, aes(x = date)) + geom_col(aes(y = incidence), alpha = 0.7) + geom_line(aes(y = true_r * 10), color = "red") + labs(title = "Synthetic Epidemic", y = "Daily Incidence", subtitle = "Red line: True R × 10") }
Creates a diagnostic plot showing the fitted serial interval mixture distribution overlaid on a histogram of observed index case-to-case (ICC) intervals from outbreak data.
plot_si_fit( dat, mean, sd, weights, dist = "normal", scaling_factor = 1, n_routes = 4L )plot_si_fit( dat, mean, sd, weights, dist = "normal", scaling_factor = 1, n_routes = 4L )
dat |
numeric vector; the index case-to-case (ICC) intervals in days. |
mean |
numeric; the estimated mean of the serial interval distribution in days. |
sd |
numeric; the estimated standard deviation of the serial interval distribution in days. |
weights |
numeric vector of length n_routes - 1; the estimated weights for each transmission route component, starting from Co-primary. The last route weight is derived internally as 1 - sum(weights). |
dist |
character; the distribution family. Must be either "normal" (default) or "gamma". |
scaling_factor |
numeric; multiplicative factor to adjust the height of the fitted density curve. Defaults to 1. |
n_routes |
integer; number of transmission routes modelled. Must be >= 2.
Defaults to 4. Must match the value used in |
A ggplot2 object.
Vink MA, Bootsma MCJ, Wallinga J (2014). Serial intervals of respiratory infectious diseases: A systematic review and analysis. American Journal of Epidemiology, 180(9), 865-875.
si_estim for serial interval estimation,
f_norm and f_gam for the underlying
mixture distribution functions
# Example 1: 4 routes, normal distribution set.seed(123) icc_data <- c( rnorm(20, mean = 0, sd = 2), rnorm(50, mean = 12, sd = 3), rnorm(20, mean = 24, sd = 4) ) icc_data <- round(pmax(icc_data, 0)) plot_si_fit( dat = icc_data, mean = 12.5, sd = 3.2, weights = c(0.2, 0.6, 0.15, 0.05), dist = "normal", n_routes = 4 ) # Example 2: 5 routes, gamma distribution plot_si_fit( dat = icc_data, mean = 12.0, sd = 3.5, weights = c(0.25, 0.65, 0.05, 0.03, 0.02), dist = "gamma", n_routes = 5, scaling_factor = 0.8 )# Example 1: 4 routes, normal distribution set.seed(123) icc_data <- c( rnorm(20, mean = 0, sd = 2), rnorm(50, mean = 12, sd = 3), rnorm(20, mean = 24, sd = 4) ) icc_data <- round(pmax(icc_data, 0)) plot_si_fit( dat = icc_data, mean = 12.5, sd = 3.2, weights = c(0.2, 0.6, 0.15, 0.05), dist = "normal", n_routes = 4 ) # Example 2: 5 routes, gamma distribution plot_si_fit( dat = icc_data, mean = 12.0, sd = 3.5, weights = c(0.25, 0.65, 0.05, 0.03, 0.02), dist = "gamma", n_routes = 5, scaling_factor = 0.8 )
A convenience wrapper for plot_si_fit that accepts the output from
si_estim directly, automatically handling the weight aggregation
for different distribution types and number of routes.
plot_si_fit_result( si_result, dat, dist = c("normal", "gamma"), scaling_factor = 1 )plot_si_fit_result( si_result, dat, dist = c("normal", "gamma"), scaling_factor = 1 )
si_result |
list; the output from |
dat |
numeric vector; the index case-to-case (ICC) intervals in days. |
dist |
character; the distribution family. Must be either "normal" (default)
or "gamma". Should match the distribution used in |
scaling_factor |
numeric; multiplicative factor to adjust the height of the fitted density curve. Defaults to 1. |
This function reads n_routes directly from the si_result object and
aggregates component weights automatically:
For normal distribution: the Co-primary weight is taken directly, and weights for routes 2 to n_routes are aggregated by summing the two symmetric components (positive and negative).
For gamma distribution: weights are taken directly as the first
n_routes - 1 values from si_result$wts.
A ggplot2 object.
set.seed(123) icc_data <- c( abs(rnorm(15, mean = 0, sd = 2)), rnorm(40, mean = 12, sd = 3), rnorm(15, mean = 24, sd = 4) ) icc_data <- round(pmax(icc_data, 0)) # 4 routes (default) result <- si_estim(icc_data, n = 50) plot_si_fit_result(result, icc_data, dist = "normal") # 5 routes result5 <- si_estim(icc_data, n = 50, n_routes = 5) plot_si_fit_result(result5, icc_data, dist = "normal")set.seed(123) icc_data <- c( abs(rnorm(15, mean = 0, sd = 2)), rnorm(40, mean = 12, sd = 3), rnorm(15, mean = 24, sd = 4) ) icc_data <- round(pmax(icc_data, 0)) # 4 routes (default) result <- si_estim(icc_data, n = 50) plot_si_fit_result(result, icc_data, dist = "normal") # 5 routes result5 <- si_estim(icc_data, n = 50, n_routes = 5) plot_si_fit_result(result5, icc_data, dist = "normal")
Estimates the mean and standard deviation of the serial interval distribution from outbreak data using the Expectation-Maximization (EM) algorithm developed by Vink et al. (2014). The serial interval is defined as the time between symptom onset in a primary case and symptom onset in a secondary case infected by that primary case.
si_estim( dat, n = 50, dist = "normal", init = NULL, tol = 1e-06, n_starts = 1, n_routes = 4L, wind = 1 )si_estim( dat, n = 50, dist = "normal", init = NULL, tol = 1e-06, n_starts = 1, n_routes = 4L, wind = 1 )
dat |
numeric vector; index case-to-case (ICC) intervals in days. |
n |
integer; number of EM algorithm iterations to perform. Defaults to 50. |
dist |
character; the assumed parametric family for the serial interval distribution. Must be either "normal" (default) or "gamma". |
init |
numeric vector of length 2; initial values for the mean and standard deviation. If NULL (default), uses the sample mean and sample standard deviation. |
tol |
numeric; convergence tolerance for the EM algorithm. Defaults to 1e-6. |
n_starts |
integer; number of random restarts for the EM algorithm. Defaults to 1. |
n_routes |
integer; number of transmission routes to model. Must be >= 2. Defaults to 4 (Co-Primary, Primary-Secondary, Primary-Tertiary, Primary-Quaternary). Increasing this allows modelling longer transmission chains. |
wind |
The window censure interval |
A named list containing:
mean: Estimated mean of the serial interval distribution (days)
sd: Estimated standard deviation of the serial interval distribution (days)
wts: Numeric vector of estimated component weights. Length is
2*n_routes - 1 for normal distribution, n_routes for gamma distribution.
converged: Logical indicating whether the algorithm converged.
iterations: Integer indicating the number of iterations performed.
loglik: Log-likelihood of the fitted model.
n_restarts: Number of restarts performed.
n_routes: Number of transmission routes used.
Vink MA, Bootsma MCJ, Wallinga J (2014). Serial intervals of respiratory infectious diseases: A systematic review and analysis. American Journal of Epidemiology, 180(9), 865-875. doi:10.1093/aje/kwu209
# Example 1: Basic usage with simulated data, default 4 routes set.seed(123) simulated_icc <- c( rep(1, 20), rep(2, 25), rep(3, 15), rep(4, 8) ) result <- si_estim(simulated_icc) # Example 2: Using 5 routes result_5routes <- si_estim(simulated_icc, n_routes = 5) # Example 3: Using gamma distribution with 3 routes result_gamma <- si_estim(simulated_icc, dist = "gamma", n_routes = 3)# Example 1: Basic usage with simulated data, default 4 routes set.seed(123) simulated_icc <- c( rep(1, 20), rep(2, 25), rep(3, 15), rep(4, 8) ) result <- si_estim(simulated_icc) # Example 2: Using 5 routes result_5routes <- si_estim(simulated_icc, n_routes = 5) # Example 3: Using gamma distribution with 3 routes result_gamma <- si_estim(simulated_icc, dist = "gamma", n_routes = 3)
Applies temporal smoothing to reproduction number estimates using a centered moving average window. Handles missing and infinite values appropriately.
smooth_estimates(r_estimate, window)smooth_estimates(r_estimate, window)
r_estimate |
numeric vector; reproduction number estimates to smooth. Can contain NA or infinite values |
window |
integer; size of the smoothing window in time units. Window is centered around each point |
numeric vector; smoothed reproduction number estimates of the same length as input. Returns NA for points with insufficient valid neighboring values
Estimates the time-varying case reproduction number (R_c) from daily incidence data using the method developed by Wallinga and Lipsitch (2007). The case reproduction number represents the average number of secondary infections generated by cases with symptom onset at time t, making it useful for retrospective outbreak analysis.
wallinga_lipsitch( incidence, dates, si_mean, si_sd, si_dist = "gamma", smoothing = 0, bootstrap = FALSE, n_bootstrap = 1000, conf_level = 0.95, shift = FALSE )wallinga_lipsitch( incidence, dates, si_mean, si_sd, si_dist = "gamma", smoothing = 0, bootstrap = FALSE, n_bootstrap = 1000, conf_level = 0.95, shift = FALSE )
incidence |
numeric vector; daily case counts. Must be non-negative integers
or counts. Length must match |
dates |
vector; dates corresponding to each incidence count. Must be the same
length as |
si_mean |
numeric; mean of the serial interval distribution in days. Must be positive. Typically estimated from contact tracing data or literature |
si_sd |
numeric; standard deviation of the serial interval distribution in days. Must be positive |
si_dist |
character; distribution family for the serial interval. Options:
|
smoothing |
integer; window size for temporal smoothing of R estimates. Use 0 for no smoothing (default), or positive integers for moving average smoothing over the specified number of days |
bootstrap |
logical; whether to calculate bootstrap confidence intervals.
Defaults to |
n_bootstrap |
integer; number of bootstrap samples when |
conf_level |
numeric; confidence level for bootstrap intervals, between 0 and 1. Defaults to 0.95 (95% confidence intervals) |
shift |
logical; whether to shift R estimates forward by one mean serial interval.
When |
The method calculates the relative likelihood that each earlier case infected each later case based on their time differences and the serial interval distribution, then aggregates these likelihoods to estimate reproduction numbers. The approach makes minimal assumptions beyond specifying the serial interval distribution.
Key features:
Pairwise likelihood approach: Considers all epidemiologically plausible transmission pairs (earlier to later cases)
Right-truncation correction: Adjusts for unobserved future cases (see calculate_truncation_correction)
Bootstrap confidence intervals: Quantifies estimation uncertainty
Temporal shifting: Optional alignment with instantaneous R estimates
Flexible smoothing: User-controlled temporal smoothing of estimates
The Wallinga-Lipsitch method estimates the case reproduction number by:
Computing transmission likelihoods from each earlier case to each later case based on the serial interval distribution
Normalizing these likelihoods so they sum to 1 for each potential infectee
Aggregating normalized likelihoods to estimate expected secondary cases per primary case
Applying corrections for right-truncation bias
Right-truncation correction accounts for secondary cases that may occur
after the observation period ends (see calculate_truncation_correction). This correction is particularly important for recent cases in the time series.
Bootstrap confidence intervals are calculated by resampling individual cases with replacement, providing non-parametric uncertainty estimates that account for both Poisson sampling variation and method uncertainty.
A data frame with the following columns:
date: Original input dates
incidence: Original input case counts
R: Estimated case reproduction number
R_corrected: Case reproduction number with right-truncation correction
R_lower, R_upper: Bootstrap confidence intervals for R (if bootstrap = TRUE)
R_corrected_lower, R_corrected_upper: Bootstrap confidence intervals for
R_corrected (if bootstrap = TRUE)
shifted_date: Dates shifted forward by mean serial interval (if shift = TRUE)
The case reproduction number differs from the instantaneous reproduction
number in timing: R_c reflects the reproductive potential of cases by their
symptom onset date, while instantaneous R reflects transmission potential
at the time of infection. Use shift = TRUE for comparisons with
instantaneous R estimates.
Wallinga J, Lipsitch M (2007). How generation intervals shape the relationship between growth rates and reproductive numbers. Proceedings of the Royal Society B: Biological Sciences, 274(1609), 599-604. doi:10.1098/rspb.2006.3754
si_estim for serial interval estimation,
generate_synthetic_epidemic for testing data,
calculate_truncation_correction for right-truncation correction
# Example 1: Basic usage with synthetic data set.seed(123) dates <- seq(as.Date("2023-01-01"), by = "day", length.out = 30) incidence <- c(1, 2, 4, 7, 12, 15, 18, 20, 22, 19, 16, 14, 11, 9, 7, 5, 4, 3, 2, 1, rep(0, 10)) # Estimate reproduction number result <- wallinga_lipsitch( incidence = incidence, dates = dates, si_mean = 7, si_sd = 3, si_dist = "gamma" ) # View results head(result) # Example 2: With bootstrap confidence intervals result_ci <- wallinga_lipsitch( incidence = incidence, dates = dates, si_mean = 7, si_sd = 3, si_dist = "gamma", bootstrap = TRUE, n_bootstrap = 500 # Reduced for faster example ) # Plot results with confidence intervals if (require(ggplot2)) { library(ggplot2) ggplot(result_ci, aes(x = date)) + geom_ribbon(aes(ymin = R_corrected_lower, ymax = R_corrected_upper), alpha = 0.3, fill = "blue") + geom_line(aes(y = R_corrected), color = "blue", size = 1) + geom_hline(yintercept = 1, linetype = "dashed", color = "red") + labs(x = "Date", y = "Reproduction Number", title = "Time-varying Reproduction Number") + theme_minimal() } # Example 3: With smoothing and shifting result_smooth <- wallinga_lipsitch( incidence = incidence, dates = dates, si_mean = 7, si_sd = 3, si_dist = "gamma", smoothing = 7, # 7-day smoothing window shift = TRUE # Shift for comparison with instantaneous R ) # Example 4: Using normal distribution for serial interval result_normal <- wallinga_lipsitch( incidence = incidence, dates = dates, si_mean = 6, si_sd = 2, si_dist = "normal", smoothing = 5 )# Example 1: Basic usage with synthetic data set.seed(123) dates <- seq(as.Date("2023-01-01"), by = "day", length.out = 30) incidence <- c(1, 2, 4, 7, 12, 15, 18, 20, 22, 19, 16, 14, 11, 9, 7, 5, 4, 3, 2, 1, rep(0, 10)) # Estimate reproduction number result <- wallinga_lipsitch( incidence = incidence, dates = dates, si_mean = 7, si_sd = 3, si_dist = "gamma" ) # View results head(result) # Example 2: With bootstrap confidence intervals result_ci <- wallinga_lipsitch( incidence = incidence, dates = dates, si_mean = 7, si_sd = 3, si_dist = "gamma", bootstrap = TRUE, n_bootstrap = 500 # Reduced for faster example ) # Plot results with confidence intervals if (require(ggplot2)) { library(ggplot2) ggplot(result_ci, aes(x = date)) + geom_ribbon(aes(ymin = R_corrected_lower, ymax = R_corrected_upper), alpha = 0.3, fill = "blue") + geom_line(aes(y = R_corrected), color = "blue", size = 1) + geom_hline(yintercept = 1, linetype = "dashed", color = "red") + labs(x = "Date", y = "Reproduction Number", title = "Time-varying Reproduction Number") + theme_minimal() } # Example 3: With smoothing and shifting result_smooth <- wallinga_lipsitch( incidence = incidence, dates = dates, si_mean = 7, si_sd = 3, si_dist = "gamma", smoothing = 7, # 7-day smoothing window shift = TRUE # Shift for comparison with instantaneous R ) # Example 4: Using normal distribution for serial interval result_normal <- wallinga_lipsitch( incidence = incidence, dates = dates, si_mean = 6, si_sd = 2, si_dist = "normal", smoothing = 5 )