Skip to contents

Standardizes dates from multiple years to enable comparison of epidemic curves and visualization of seasonal patterns in infectious disease surveillance data. Commonly used for creating periodicity plots of respiratory diseases like influenza, RSV, or COVID-19.

Usage

align_dates_seasonal(
  x,
  dates_from = NULL,
  date_resolution = c("week", "isoweek", "epiweek", "day", "month"),
  start = NULL,
  target_year = NULL,
  drop_leap_week = TRUE
)

align_and_bin_dates_seasonal(
  x,
  n = 1,
  dates_from,
  population = 1,
  fill_gaps = FALSE,
  date_resolution = c("week", "isoweek", "epiweek", "day", "month"),
  start = NULL,
  target_year = NULL,
  drop_leap_week = TRUE
)

Arguments

x

Either a data frame with a date column, or a date vector.
Supported date formats are date and datetime and also commonly used character strings:

  • ISO dates "2024-03-09"

  • Month "2024-03"

  • Week "2024-W09" or "2024-W09-1"

dates_from

Column name containing the dates to align. Used when x is a data.frame.

date_resolution

Character string specifying the temporal resolution. One of:

  • "week" or "isoweek" - Calendar weeks (ISO 8601), reporting weeks as used by the ECDC.

  • "epiweek" - Epidemiological weeks (US CDC), i.e. ISO weeks with Sunday as week start.

  • "month" - Calendar months

  • "day" - Daily resolution

start

Numeric value indicating epidemic season start:

  • For week/epiweek: week number (default: 28, approximately July)

  • For month: month number (default: 7 for July)

  • For day: day of year (default: 150, approximately June)

target_year

Numeric value for the reference year to align dates to. The default target year is the start of the most recent season in the data. This way the most recent dates stay unchanged.

drop_leap_week

If TRUE and date_resolution is week, isoweek or epiweek, leap weeks (week 53) are dropped if they are not in the most recent season. Disable if data should be returned. Dropping week 53 from historical data is the most common approach. Otherwise historical data for week 53 would map to week 52 if the target season has no leap week, resulting in a doubling of the case counts.

n

Numeric column with case counts. Supports quoted and unquoted column names.

population

A number or a numeric column with the population size. Used to calculate the incidence.

fill_gaps

Logical; If TRUE, gaps in the time series will be filled with 0 cases.

Value

A data frame with standardized date columns:

  • year: Calendar year from original date

  • week/month/day: Time unit based on chosen resolution

  • date_aligned: Date standardized to target year

  • season: Epidemic season identifier (e.g., "2023/24")

  • current_season: Logical flag for most recent season

Binning also creates the columns:

  • n: Sum of cases in bin

  • incidence: Incidence calculated using n/population

Details

This function helps create standardized epidemic curves by aligning surveillance data from different years. This enables:

  • Comparison of disease patterns across multiple seasons

  • Identification of typical seasonal trends

  • Detection of unusual disease activity

  • Assessment of current season against historical patterns

The alignment can be done at different temporal resolutions (daily, weekly, monthly) with customizable season start points to match different disease patterns or surveillance protocols.

Examples

# Sesonal Visualization of Germany Influenza Surveillance Data
library(ggplot2)

influenza_germany |>
  align_dates_seasonal(
    dates_from = ReportingWeek, date_resolution = "epiweek", start = 28
  ) -> df_flu_aligned

ggplot(df_flu_aligned, aes(x = date_aligned, y = Incidence, color = season)) +
  geom_line() +
  facet_wrap(~AgeGroup) +
  theme_bw()