Exercise Week 3: Solutions

library(fpp3)

fpp3 2.10, Ex 6

The aus_arrivals data set comprises quarterly international arrivals (in thousands) to Australia from Japan, New Zealand, UK and the US. Use autoplot(), gg_season() and gg_subseries() to compare the differences between the arrivals from these four countries. Can you identify any unusual observations?

aus_arrivals |> autoplot(Arrivals) +
  labs(title="Quarterly international arrivals to Australia",
       y="Visitors ('000)")

Generally the number of arrivals to Australia is increasing over the entire series, with the exception of Japanese visitors which begin to decline after 1995. The series appear to have a seasonal pattern which varies proportionately to the number of arrivals. Interestingly, the number of visitors from NZ peaks sharply in 1988. The seasonal pattern from Japan appears to change substantially.

aus_arrivals |> gg_season(Arrivals, labels = "both")

The seasonal pattern of arrivals appears to vary between each country. In particular, arrivals from the UK appears to be lowest in Q2 and Q3, and increase substantially for Q4 and Q1. Whereas for NZ visitors, the lowest period of arrivals is in Q1, and highest in Q3. Similar variations can be seen for Japan and US.

aus_arrivals |> gg_subseries(Arrivals)

The subseries plot reveals more interesting features. It is evident that whilst the UK arrivals is increasing, most of this increase is seasonal. More arrivals are coming during Q1 and Q4, whilst the increase in Q2 and Q3 is less extreme. The growth in arrivals from NZ and US appears fairly similar across all quarters. There exists an unusual spike in arrivals from the US in 1992 Q3.

Unusual observations:

  • 2000 Q3: Spikes from the US (Sydney Olympics arrivals)
  • 2001 Q3-Q4 are unusual for US (9/11 effect)
  • 1991 Q3 is unusual for the US (Gulf war effect?)

fpp3 2.10, Ex 7

Monthly Australian retail data is provided in aus_retail. Select one of the time series as follows (but choose your own seed value):

set.seed(12345678)
myseries <- aus_retail |>
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))

Explore your chosen retail time series using the following functions:

autoplot(), gg_season(), gg_subseries(), gg_lag(), ACF() |> autoplot()
set.seed(12345678)
myseries <- aus_retail |>
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))
myseries |>
  autoplot(Turnover) +
  labs(y = "Turnover (million $AUD)", x = "Time (Years)",
       title = myseries$Industry[1],
       subtitle = myseries$State[1])

The data features a non-linear upward trend and a strong seasonal pattern. The variability in the data appears proportional to the amount of turnover (level of the series) over the time period.

myseries |>
  gg_season(Turnover, labels = "both") +
  labs(y = "Turnover (million $AUD)",
       title = myseries$Industry[1],
       subtitle = myseries$State[1])

Strong seasonality is evident in the season plot. Large increases in clothing retailing can be observed in December (probably a Christmas effect). There is also a peak in July that appears to be getting stronger over time. 2016 had an unusual pattern in the first half of the year.

myseries |>
  gg_subseries(Turnover) +
  labs(y = "Turnover (million $AUD)", x="")

There is a strong trend in all months, with the largest trend in December and a larger increase in July and August than most other months.

myseries |>
  gg_lag(Turnover, lags=1:24, geom='point') + facet_wrap(~ .lag, ncol=6)

myseries |>
  ACF(Turnover, lag_max = 50) |>
  autoplot()

fpp3 2.10, Ex 8

Use the following graphics functions: autoplot(), gg_season(), gg_subseries(), gg_lag(), ACF() and explore features from the following time series: “Total Private” Employed from us_employment, Bricks from aus_production, Hare from pelt, “H02” Cost from PBS, and us_gasoline.

  • Can you spot any seasonality, cyclicity and trend?
  • What do you learn about the series?
  • What can you say about the seasonal patterns?
  • Can you identify any unusual years?

Total Private Employment in the US

us_employment |>
  filter(Title == "Total Private") |>
  autoplot(Employed)

There is a strong trend and seasonality. Some cyclic behaviour is seen, with a big drop due to the global financial crisis.

us_employment |>
  filter(Title == "Total Private") |>
  gg_season(Employed)

us_employment |>
  filter(Title == "Total Private") |>
  gg_subseries(Employed)

us_employment |>
  filter(Title == "Total Private") |>
  gg_lag(Employed)

us_employment |>
  filter(Title == "Total Private") |>
  ACF(Employed) |>
  autoplot()

In all of these plots, the trend is so dominant that it is hard to see anything else. We need to remove the trend so we can explore the other features of the data.

Brick production in Australia

aus_production |>
  autoplot(Bricks)

A positive trend in the first 20 years, and a negative trend in the next 25 years. Strong quarterly seasonality, with some cyclicity – note the recessions in the 1970s and 1980s.

aus_production |>
  gg_season(Bricks)

Brick production tends to be lowest in the first quarter and peak in either quarter 2 or quarter 3.

aus_production |>
  gg_subseries(Bricks)

The decrease in the last 25 years has been weakest in Q1.

aus_production |>
  gg_lag(Bricks, geom='point')

aus_production |>
  ACF(Bricks) |> autoplot()

The seasonality shows up as peaks at lags 4, 8, 12, 16, 20, …. The trend is seen with the slow decline on the positive side.

Snow hare trappings in Canada

pelt |>
  autoplot(Hare)

There is some cyclic behaviour with substantial variation in the length of the period.

pelt |>
  gg_lag(Hare, geom='point')

pelt |>
  ACF(Hare) |> autoplot()

The cyclic period seems to have an average of about 10 (due to the local maximum in ACF at lag 10).

H02 sales in Australia

There are four series corresponding to H02 sales, so we will add them together.

h02 <- PBS |>
  filter(ATC2 == "H02") |>
  group_by(ATC2) |>
  summarise(Cost = sum(Cost)) |>
  ungroup()
h02 |>
  autoplot(Cost)

A positive trend with strong monthly seasonality, dropping suddenly every February.

h02 |>
  gg_season(Cost)

h02 |>
  gg_subseries(Cost)

The trends have been greater in the higher peaking months – this leads to increasing seasonal variation.

h02 |>
  gg_lag(Cost, geom='point', lags=1:16)

h02 |>
  ACF(Cost) |> autoplot()

The large January sales show up as a separate cluster of points in the lag plots. The strong seasonality is clear in the ACF plot.

US gasoline sales

us_gasoline |>
  autoplot(Barrels)

A positive trend until 2008, and then the global financial crisis led to a drop in sales until 2012. The shape of the seasonality seems to have changed over time.

us_gasoline |>
  gg_season(Barrels)

There is a lot of noise making it hard to see the overall seasonal pattern. However, it seems to drop towards the end of quarter 4.

us_gasoline |>
  gg_subseries(Barrels)

The blue lines are helpful in seeing the average seasonal pattern.

us_gasoline |>
  gg_lag(Barrels, geom='point', lags=1:16)

us_gasoline |>
  ACF(Barrels, lag_max = 150) |> autoplot()

The seasonality is seen if we increase the lags to at least 2 years (approx 104 weeks)

fpp3 2.10, Ex 9

The following time plots and ACF plots correspond to four different time series. Your task is to match each time plot in the first row with one of the ACF plots in the second row.

1-B, 2-A, 3-D, 4-C

fpp3 2.10, Ex 10

The aus_livestock data contains the monthly total number of pigs slaughtered in Victoria, Australia, from Jul 1972 to Dec 2018. Use filter() to extract pig slaughters in Victoria between 1990 and 1995. Use autoplot and ACF for this data. How do they differ from white noise? If a longer period of data is used, what difference does it make to the ACF?

vic_pigs <- aus_livestock |>
  filter(Animal == "Pigs", State == "Victoria", between(year(Month), 1990, 1995))
vic_pigs
# A tsibble: 72 x 4 [1M]
# Key:       Animal, State [1]
      Month Animal State    Count
      <mth> <fct>  <fct>    <dbl>
 1 1990 Jan Pigs   Victoria 76000
 2 1990 Feb Pigs   Victoria 78100
 3 1990 Mar Pigs   Victoria 77600
 4 1990 Apr Pigs   Victoria 84100
 5 1990 May Pigs   Victoria 98000
 6 1990 Jun Pigs   Victoria 89100
 7 1990 Jul Pigs   Victoria 93500
 8 1990 Aug Pigs   Victoria 84700
 9 1990 Sep Pigs   Victoria 74500
10 1990 Oct Pigs   Victoria 91900
# ℹ 62 more rows
vic_pigs |>
  autoplot(Count)

Although the values appear to vary erratically between months, a general upward trend is evident between 1990 and 1995. In contrast, a white noise plot does not exhibit any trend.

vic_pigs |> ACF(Count) |> autoplot()

The first 14 lags are significant, as the ACF slowly decays. This suggests that the data contains a trend. A white noise ACF plot would not usually contain any significant lags. The large spike at lag 12 suggests there is some seasonality in the data.

aus_livestock |>
  filter(Animal == "Pigs", State == "Victoria") |>
  ACF(Count) |>
  autoplot()

The longer series has much larger autocorrelations, plus clear evidence of seasonality at the seasonal lags of 12, 24, \dots.

fpp3 2.10, Ex 11

  1. Use the following code to compute the daily changes in Google closing stock prices.

    dgoog <- gafa_stock |>
      filter(Symbol == "GOOG", year(Date) >= 2018) |>
      mutate(trading_day = row_number()) |>
      update_tsibble(index = trading_day, regular = TRUE) |>
      mutate(diff = difference(Close))
  2. Why was it necessary to re-index the tsibble?

  3. Plot these differences and their ACF.

  4. Do the changes in the stock prices look like white noise?

dgoog <- gafa_stock |>
  filter(Symbol == "GOOG", year(Date) >= 2018) |>
  mutate(trading_day = row_number()) |>
  update_tsibble(index = trading_day, regular = TRUE) |>
  mutate(diff = difference(Close))

The tsibble needed re-indexing as trading happens irregularly. The new index is based only on trading days.

dgoog |> autoplot(diff)

dgoog |> ACF(diff, lag_max=100) |> autoplot()

There are some small significant autocorrelations out to lag 24, but nothing after that. Given the probability of a false positive is 5%, these look similar to white noise.