| Title: | Estimate the Effective Reproductive Number with Trend Filtering |
|---|---|
| Description: | Use trend filtering, a type of regularized nonparametric regression, to estimate the instantaneous reproduction number, also called Rt. This value roughly says how many new infections will result from each new infection today. Values larger than 1 indicate that an epidemic is growing while those less than 1 indicate decline. For more details about this methodology, see Liu, Cai, Gustafson, and McDonald (2024) <doi:10.1371/journal.pcbi.1012324>. |
| Authors: | Daniel J. McDonald [aut, cre, cph], Jiaping Liu [aut], Zhenglun Cai [ctb] |
| Maintainer: | Daniel J. McDonald <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 1.0.2.9001 |
| Built: | 2026-05-14 20:21:52 UTC |
| Source: | https://github.com/dajmcdon/rtestim |
This dataset contains 3+ years of incident COVID-19 case counts as reported by opencovid.ca as of July 4, 2023.
cancovidcancovid
A data frame with 1,253 rows and 2 columns:
The observed date. A Date object.
The number of new recorded cases for this date.
This data is available under the CC-BY-4.0 License.
See:
Berry, I., O’Neill, M., Sturrock, S. L., Wright, J. E., Acharya, K., Brankston, G., Harish, V., Kornas, K., Maani, N., Naganathan, T., Obress, L., Rossi, T., Simmons, A. E., Van Camp, M., Xie, X., Tuite, A. R., Greer, A. L., Fisman, D. N., & Soucy, J.-P. R. (2021). A sub-national real-time epidemiological and vaccination database for the COVID-19 pandemic in Canada. Scientific Data, 8(1). doi:10.1038/s41597-021-00955-2
Create an approximate confidence band for the Rt or incidence estimate. Note that the variance computation is approximate.
confband(object, lambda, level = 0.95, type = c("Rt", "Yt"), ...)confband(object, lambda, level = 0.95, type = c("Rt", "Yt"), ...)
object |
a |
lambda |
the selected lambda. May be a scalar value, or in the case of
|
level |
the desired confidence level(s). These will be sorted if necessary. |
type |
the type |
... |
additional arguments for methods. Unused. |
A data.frame containing the estimates Rt or Yt at the chosen
lambda, and confidence limits corresponding to level
y <- c(1, rpois(100, dnorm(1:100, 50, 15) * 500 + 1)) out <- estimate_rt(y, nsol = 10) head(confband(out, out$lambda[2])) head(confband(out, out$lambda[2], level = c(0.95, 0.8, 0.5))) cv <- cv_estimate_rt(y, nfold = 3, nsol = 30) head(confband(cv, "lambda.min", c(0.5, 0.9)))y <- c(1, rpois(100, dnorm(1:100, 50, 15) * 500 + 1)) out <- estimate_rt(y, nsol = 10) head(confband(out, out$lambda[2])) head(confband(out, out$lambda[2], level = c(0.95, 0.8, 0.5))) cv <- cv_estimate_rt(y, nfold = 3, nsol = 30) head(confband(cv, "lambda.min", c(0.5, 0.9)))
Rt estimation algorithm configuration
configure_rt_admm( rho = -1, alpha = 0.5, gamma = 0.9, tolerance = 1e-04, maxiter_newton = 50L, maxiter_line = 20L, verbose = 0, ... )configure_rt_admm( rho = -1, alpha = 0.5, gamma = 0.9, tolerance = 1e-04, maxiter_newton = 50L, maxiter_line = 20L, verbose = 0, ... )
rho |
Double. An ADMM parameter; coefficient of augmented term in the Lagrangian function. |
alpha |
Double. A parameter adjusting upper bound in line search algorithm
in |
gamma |
Double. A parameter adjusting step size in line search algorithm
in |
tolerance |
Double. Tolerance of ADMM convergence. |
maxiter_newton |
Integer. Maximum number of iterations for the outer Newton iteration. |
maxiter_line |
Integer. Maximum number of iterations for the linesearch algorithm in the proximal Newton method. |
verbose |
Integer. |
... |
space for future extensions |
a list of model parameters with class rt_admm_configuration
configure_rt_admm() configure_rt_admm(tolerance = 1e-6, verbose = 1L)configure_rt_admm() configure_rt_admm(tolerance = 1e-6, verbose = 1L)
Leave-kth-out cross validation for choosing a optimal parameter lambda
cv_estimate_rt( observed_counts, korder = 3L, dist_gamma = c(2.5, 2.5), nfold = 3L, error_measure = c("deviance", "mse", "mae"), x = 1:n, lambda = NULL, maxiter = 1000000L, delay_distn = NULL, delay_distn_periodicity = NULL, regular_splits = FALSE, invert_splits = FALSE, ... )cv_estimate_rt( observed_counts, korder = 3L, dist_gamma = c(2.5, 2.5), nfold = 3L, error_measure = c("deviance", "mse", "mae"), x = 1:n, lambda = NULL, maxiter = 1000000L, delay_distn = NULL, delay_distn_periodicity = NULL, regular_splits = FALSE, invert_splits = FALSE, ... )
observed_counts |
vector of the observed daily infection counts |
korder |
Integer. Degree of the piecewise polynomial curve to be
estimated. For example, |
dist_gamma |
Vector of length 2. These are the shape and scale for the assumed serial interval distribution. Roughly, this distribution describes the probability of an infectious individual infecting someone else after some period of time after having become infectious. As in most literature, we assume that this interval follows a gamma distribution with some shape and scale. |
nfold |
Integer. This number of folds to conduct the leave-kth-out
cross validation. For leave-kth-out cross validation, every kth
observed_counts and their corresponding position (evenly or unevenly
spaced) are placed into the same fold. The first and last observed_counts are
not assigned to any folds. Smallest allowable value is |
error_measure |
Metric used to calculate cross validation scores.
Must be choose from |
x |
a vector of positions at which the counts have been observed. In an ideal case, we would observe data at regular intervals (e.g. daily or weekly) but this may not always be the case. May be numeric or Date. |
lambda |
Vector. A user supplied sequence of tuning parameters which
determines the balance between data fidelity and
smoothness of the estimated Rt; larger |
maxiter |
Integer. Maximum number of iterations for the estimation algorithm. |
delay_distn |
in the case of a non-gamma delay distribution,
a vector or matrix (or |
delay_distn_periodicity |
Controls the relationship between the spacing
of the computed delay distribution and the spacing of |
regular_splits |
Logical.
If
where 0 indicates no assignment.
Therefore, the folds are not random and running |
invert_splits |
Logical.
Typical K-fold CV would use K-1 folds for the training
set while reserving 1 fold for evaluation (repeating the split K times).
Setting this to true inverts this process, using a much smaller training
set with a larger evaluation set. This tends to result in larger values
of |
... |
additional parameters passed to |
An object with S3 class "cv_poisson_rt". Among the list components:
full_fit An object with S3 class "poisson_rt", fitted with all
observed_counts and lambda
cv_scores leave-kth-out cross validation scores
cv_se leave-kth-out cross validation standard error
lambda.min lambda which achieved the optimal cross validation score
lambda.1se lambda that gives the optimal cross validation score
within one standard error.
lambda the value of lambda used in the algorithm.
y <- c(1, rpois(100, dnorm(1:100, 50, 15) * 500 + 1)) cv <- cv_estimate_rt(y, korder = 3, nfold = 3, nsol = 30) cvy <- c(1, rpois(100, dnorm(1:100, 50, 15) * 500 + 1)) cv <- cv_estimate_rt(y, korder = 3, nfold = 3, nsol = 30) cv
The total infectiousness at each observed time point is calculated
by , where denotes the vector containing
observed incidence, and denotes the generation interval
distribution. Typically, the generation interval is challenging to estimate
from data, so the serial interval is used instead. The serial interval
distribution expresses the probability
of a secondary infection caused by a primary infection which occurred
days earlier.
delay_calculator( observed_counts, x = NULL, dist_gamma = c(2.5, 2.5), delay_distn = NULL, delay_distn_periodicity = NULL, xout = x )delay_calculator( observed_counts, x = NULL, dist_gamma = c(2.5, 2.5), delay_distn = NULL, delay_distn_periodicity = NULL, xout = x )
observed_counts |
vector of the observed daily infection counts |
x |
a vector of positions at which the counts have been observed. In an ideal case, we would observe data at regular intervals (e.g. daily or weekly) but this may not always be the case. May be numeric or Date. |
dist_gamma |
Vector of length 2. These are the shape and scale for the assumed serial interval distribution. Roughly, this distribution describes the probability of an infectious individual infecting someone else after some period of time after having become infectious. As in most literature, we assume that this interval follows a gamma distribution with some shape and scale. |
delay_distn |
in the case of a non-gamma delay distribution,
a vector or matrix (or |
delay_distn_periodicity |
Controls the relationship between the spacing
of the computed delay distribution and the spacing of |
xout |
a vector of positions at which the results should be returned.
By default, this will be the same as |
A vector containing the total infectiousness at each
point xout.
delay_calculator(c(3, 2, 5, 3, 1), dist_gamma = c(2.5, 2.5))delay_calculator(c(3, 2, 5, 3, 1), dist_gamma = c(2.5, 2.5))
The serial interval distribution expresses the probability of the symptom onset of a secondary infection occurred a given number of days after the primary infection. The serial interval distribution is commonly represented by a discretized Gamma distribution in literature, parametrized by the shape and scale parameters.
discretize_gamma(x, shape = 2.5, scale = 2.5, rate = 1/scale)discretize_gamma(x, shape = 2.5, scale = 2.5, rate = 1/scale)
x |
locations (times) where cases are observed. Must be nonnegative. |
shape, scale
|
shape and scale parameters. Must be positive,
|
rate |
an alternative way to specify the scale. |
probability mass of the discretized gamma distribution
discretize_gamma(1:30, shape = 1, scale = 1)discretize_gamma(1:30, shape = 1, scale = 1)
The Effective Reproduction Number of an infectious
disease can be estimated by solving the smoothness penalized Poisson
regression (trend filtering) of the form:
where , is the observed case count at day
, is the weighted past counts at day ,
is the smoothness penalty, and is the -th order
difference matrix.
estimate_rt( observed_counts, korder = 3L, dist_gamma = c(2.5, 2.5), x = 1:n, lambda = NULL, nsol = 50L, delay_distn = NULL, delay_distn_periodicity = NULL, lambdamin = NULL, lambdamax = NULL, lambda_min_ratio = 1e-04, maxiter = 1e+05, init = configure_rt_admm() )estimate_rt( observed_counts, korder = 3L, dist_gamma = c(2.5, 2.5), x = 1:n, lambda = NULL, nsol = 50L, delay_distn = NULL, delay_distn_periodicity = NULL, lambdamin = NULL, lambdamax = NULL, lambda_min_ratio = 1e-04, maxiter = 1e+05, init = configure_rt_admm() )
observed_counts |
vector of the observed daily infection counts |
korder |
Integer. Degree of the piecewise polynomial curve to be
estimated. For example, |
dist_gamma |
Vector of length 2. These are the shape and scale for the assumed serial interval distribution. Roughly, this distribution describes the probability of an infectious individual infecting someone else after some period of time after having become infectious. As in most literature, we assume that this interval follows a gamma distribution with some shape and scale. |
x |
a vector of positions at which the counts have been observed. In an ideal case, we would observe data at regular intervals (e.g. daily or weekly) but this may not always be the case. May be numeric or Date. |
lambda |
Vector. A user supplied sequence of tuning parameters which
determines the balance between data fidelity and
smoothness of the estimated Rt; larger |
nsol |
Integer. The number of tuning parameters |
delay_distn |
in the case of a non-gamma delay distribution,
a vector or matrix (or |
delay_distn_periodicity |
Controls the relationship between the spacing
of the computed delay distribution and the spacing of |
lambdamin |
Optional value for the smallest |
lambdamax |
Optional value for the largest |
lambda_min_ratio |
If neither |
maxiter |
Integer. Maximum number of iterations for the estimation algorithm. |
init |
a list of internal configuration parameters of class
|
An object with S3 class poisson_rt. Among the list components:
observed_counts the observed daily infection counts.
x a vector of positions at which the counts have been observed.
weighted_past_counts the weighted sum of past infection counts.
Rt the estimated effective reproduction rate. This is a matrix with
each column corresponding to one value of lambda.
lambda the values of lambda actually used in the algorithm.
korder degree of the estimated piecewise polynomial curve.
dof degrees of freedom of the estimated trend filtering problem.
niter the required number of iterations for each value of lambda.
convergence if number of iterations for each value of lambda is less
than the maximum number of iterations for the estimation algorithm.
y <- c(1, rpois(100, dnorm(1:100, 50, 15) * 500 + 1)) out <- estimate_rt(y) out plot(out) out0 <- estimate_rt(y, korder = 0L, nsol = 40) out0 plot(out0)y <- c(1, rpois(100, dnorm(1:100, 50, 15) * 500 + 1)) out <- estimate_rt(y) out plot(out) out0 <- estimate_rt(y, korder = 0L, nsol = 40) out0 plot(out0)
Fitted cv_poisson_rt
## S3 method for class 'cv_poisson_rt' fitted(object, which_lambda = c("lambda.min", "lambda.1se"), ...)## S3 method for class 'cv_poisson_rt' fitted(object, which_lambda = c("lambda.min", "lambda.1se"), ...)
object |
result of cross validation of type |
which_lambda |
select which Rt's to output. If not provided, all Rt's are returned. If provided a list of lambda,the corresponding Rt estimation will be returned. If provided a string, it must be either one of
|
... |
not used. |
Rt's estimated from provided lambda
y <- c(1, rpois(100, dnorm(1:100, 50, 15) * 500 + 1)) cv <- cv_estimate_rt(y, korder = 3, nfold = 3, nsol = 30) f <- fitted(cv) f <- fitted(cv, which_lambda = cv$lambda[1]) f <- fitted(cv, which_lambda = "lambda.1se") f <- fitted(cv, which_lambda = NULL)y <- c(1, rpois(100, dnorm(1:100, 50, 15) * 500 + 1)) cv <- cv_estimate_rt(y, korder = 3, nfold = 3, nsol = 30) f <- fitted(cv) f <- fitted(cv, which_lambda = cv$lambda[1]) f <- fitted(cv, which_lambda = "lambda.1se") f <- fitted(cv, which_lambda = NULL)
Interpolate (or extrapolate) Rt estimates to intermediate design points
interpolate_rt(object, xout, ...) ## S3 method for class 'cv_poisson_rt' interpolate_rt(object, xout, which_lambda = c("lambda.min", "lambda.1se"), ...) ## S3 method for class 'poisson_rt' interpolate_rt(object, xout, lambda = NULL, ...)interpolate_rt(object, xout, ...) ## S3 method for class 'cv_poisson_rt' interpolate_rt(object, xout, which_lambda = c("lambda.min", "lambda.1se"), ...) ## S3 method for class 'poisson_rt' interpolate_rt(object, xout, lambda = NULL, ...)
object |
A fitted object produced by |
xout |
a vector of new positions at which Rt should be produced, but where counts may not have been observed. |
... |
additional arguments passed to methods. |
which_lambda |
Select which lambdas from the object to use. If not provided, all Rt's are returned. Note that new lambdas not originally used in the estimation procedure may be provided, but the results will be calculated by linearly interpolating the estimated Rt's. The strings |
lambda |
Vector. A user supplied sequence of tuning parameters which
determines the balance between data fidelity and
smoothness of the estimated Rt; larger |
A vector or matrix of interpolated Rt estimates.
y <- c(1, rpois(100, dnorm(1:100, 50, 15) * 500 + 1)) out <- estimate_rt(y) # originally estimated at out$x # get the Rt at 3 new points (for all estimated lambdas) int <- interpolate_rt(out, c(10.5, 11.5, 12.5)) # get the Rt at a single value of lambda interpolate_rt(out, c(10.5, 11.5, 12.5), lambda = out$lambda[20]) y <- c(1, rpois(100, dnorm(1:100, 50, 15) * 500 + 1)) out <- estimate_rt(y, nsol = 10) interpolate_rt(out, xout = c(1.5, 2.5))y <- c(1, rpois(100, dnorm(1:100, 50, 15) * 500 + 1)) out <- estimate_rt(y) # originally estimated at out$x # get the Rt at 3 new points (for all estimated lambdas) int <- interpolate_rt(out, c(10.5, 11.5, 12.5)) # get the Rt at a single value of lambda interpolate_rt(out, c(10.5, 11.5, 12.5), lambda = out$lambda[20]) y <- c(1, rpois(100, dnorm(1:100, 50, 15) * 500 + 1)) out <- estimate_rt(y, nsol = 10) interpolate_rt(out, xout = c(1.5, 2.5))
Plot cv_poisson_rt
## S3 method for class 'cv_poisson_rt' plot(x, which_lambda = c("cv_scores", "lambda.min", "lambda.1se"), ...)## S3 method for class 'cv_poisson_rt' plot(x, which_lambda = c("cv_scores", "lambda.min", "lambda.1se"), ...)
x |
result of cv_estimate_rt of class |
which_lambda |
select which Rt's to plot. If not provided, the cross validation score will be plotted. If provided a list of lambda, the corresponding Rt estimation will be plotted. If provided a string, it
must be either one of
|
... |
Not used. |
y <- c(1, rpois(100, dnorm(1:100, 50, 15) * 500 + 1)) cv <- cv_estimate_rt(y, korder = 1, nfold = 3, nsol = 30) plot(cv) plot(cv, which_lambda = cv$lambda[1]) plot(cv, which_lambda = "lambda.min") plot(cv, which_lambda = "lambda.1se") plot(cv, NULL)y <- c(1, rpois(100, dnorm(1:100, 50, 15) * 500 + 1)) cv <- cv_estimate_rt(y, korder = 1, nfold = 3, nsol = 30) plot(cv) plot(cv, which_lambda = cv$lambda[1]) plot(cv, which_lambda = "lambda.min") plot(cv, which_lambda = "lambda.1se") plot(cv, NULL)
poisson_rt objectProduces a figure showing some or all estimated Rt values for different
values of the penalty. The result is a ggplot2::ggplot(). Additional user
modifications can be added as desired.
## S3 method for class 'poisson_rt' plot(x, lambda = NULL, ...)## S3 method for class 'poisson_rt' plot(x, lambda = NULL, ...)
x |
output of the function |
lambda |
select which Rt's to plot. If not provided, all Rt's are plotted. |
... |
Not used. |
y <- c(1, rpois(100, dnorm(1:100, 50, 15) * 500 + 1)) out <- estimate_rt(y, lambda = log(c(1.1, 1.3, 1.5))) plot(out)y <- c(1, rpois(100, dnorm(1:100, 50, 15) * 500 + 1)) out <- estimate_rt(y, lambda = log(c(1.1, 1.3, 1.5))) plot(out)
Produces a figure showing a single estimated Rt value along with approximate
confidence bands. The result is a ggplot2::ggplot(). Additional user
modifications can be added as desired.
## S3 method for class 'rt_confidence_band' plot(x, colour = "#3A448F", ...)## S3 method for class 'rt_confidence_band' plot(x, colour = "#3A448F", ...)
x |
An object of class |
colour |
The colour of the desired plot |
... |
Not used. |
y <- c(1, rpois(100, dnorm(1:100, 50, 15) * 500 + 1)) out <- estimate_rt(y, nsol = 10) cb <- confband(out, out$lambda[2], level = c(0.95, 0.8, 0.5)) plot(cb) cb_y <- confband(out, out$lambda[2], level = c(0.95, 0.8, 0.5), type = "Yt") plot(cb_y)y <- c(1, rpois(100, dnorm(1:100, 50, 15) * 500 + 1)) out <- estimate_rt(y, nsol = 10) cb <- confband(out, out$lambda[2], level = c(0.95, 0.8, 0.5)) plot(cb) cb_y <- confband(out, out$lambda[2], level = c(0.95, 0.8, 0.5), type = "Yt") plot(cb_y)
Given an object of class poisson_rt produced with estimate_rt(),
calculate predicted observed cases for the estimated Rt values.
Note: This function is not intended for "new x" or to produce forecasts, but
rather to examine how Rt relates to observables.
## S3 method for class 'cv_poisson_rt' predict(object, which_lambda = c("lambda.min", "lambda.1se"), ...)## S3 method for class 'cv_poisson_rt' predict(object, which_lambda = c("lambda.min", "lambda.1se"), ...)
object |
result of cross validation of type |
which_lambda |
Select which lambdas from the object to use. If not provided, all Rt's are returned. Note that new lambdas not originally used in the estimation procedure may be provided, but the results will be calculated by linearly interpolating the estimated Rt's. The strings |
... |
not used. |
A vector or matrix of predicted case counts.
y <- c(1, rpois(100, dnorm(1:100, 50, 15) * 500 + 1)) cv <- cv_estimate_rt(y, korder = 3, nfold = 3, nsol = 30) p <- predict(cv) p <- predict(cv, which_lambda = cv$lambda[1]) p <- predict(cv, which_lambda = "lambda.1se") p <- predict(cv, which_lambda = NULL) plot(y) matlines(p, lty = 2)y <- c(1, rpois(100, dnorm(1:100, 50, 15) * 500 + 1)) cv <- cv_estimate_rt(y, korder = 3, nfold = 3, nsol = 30) p <- predict(cv) p <- predict(cv, which_lambda = cv$lambda[1]) p <- predict(cv, which_lambda = "lambda.1se") p <- predict(cv, which_lambda = NULL) plot(y) matlines(p, lty = 2)
Given an object of class poisson_rt produced with estimate_rt(),
calculate predicted observed cases for the estimated Rt values.
Note: This function is not intended for "new x" or to produce forecasts, but
rather to examine how Rt relates to observables.
## S3 method for class 'poisson_rt' predict(object, lambda = NULL, ...)## S3 method for class 'poisson_rt' predict(object, lambda = NULL, ...)
object |
An object of class |
lambda |
Select which lambdas from the object to use. If not provided (the default), all are returned. Note that new lambdas not originally used in the estimation procedure may be provided, but the results will be calculated by linearly interpolating the estimated Rt's. |
... |
Not used. |
A vector or matrix of predicted case counts.
y <- c(1, rpois(100, dnorm(1:100, 50, 15) * 500 + 1)) out <- estimate_rt(y, nsol = 10) preds <- predict(out) plot(y) matlines(preds, lty = 1)y <- c(1, rpois(100, dnorm(1:100, 50, 15) * 500 + 1)) out <- estimate_rt(y, nsol = 10) preds <- predict(out) plot(y) matlines(preds, lty = 1)