Assignment 2

This assignment will use the same data that you will use in the retail project later in the semester. Each student will use a different time series, selected using their student ID number as follows.

library(fpp3)
get_my_data <- function(student_id) {
  set.seed(student_id)
  all_data <- readr::read_rds("https://bit.ly/monashretaildata")
  while(TRUE) {
    retail <- filter(all_data, `Series ID` == sample(`Series ID`, 1))
    if(!any(is.na(fill_gaps(retail)$Turnover))) return(retail)
  }
}
# Replace the argument with your student ID
retail <- get_my_data(12345678)

Using a test set of 2019–2022, fit an ETS model chosen automatically, and three benchmark methods to the training data. Which gives the best forecasts on the test set, based on RMSE?
Check the residuals from the best model using an ACF plot and a Ljung-Box test. Do the residuals appear to be white noise?
Now use time-series cross-validation with a minimum sample size of 15 years, a step size of 1 year, and a forecast horizon of 5 years. Calculate the RMSE of the results. Does it change the conclusion you reach based on the test set?
Which of these two methods of evaluating accuracy is more reliable? Why?

Submit a Quarto (qmd) file which carries out the above analysis. You need to submit one file which implements all steps above. You may use this file as a starting point.

To receive full marks, the qmd file must compile without errors.

Due: 25 April 2025
Submit