Exercise Week 2: Solutions

fpp3 2.10, Ex 1

Explore the following four time series: Bricks from aus_production, Lynx from pelt, Close from gafa_stock, Demand from vic_elec.

  • Use ? (or help()) to find out about the data in each series.
  • What is the time interval of each series?
  • Use autoplot() to produce a time plot of each series.
  • For the last plot, modify the axis labels and title.

Bricks

aus_production
# A tsibble: 218 x 7 [1Q]
   Quarter  Beer Tobacco Bricks Cement Electricity   Gas
     <qtr> <dbl>   <dbl>  <dbl>  <dbl>       <dbl> <dbl>
 1 1956 Q1   284    5225    189    465        3923     5
 2 1956 Q2   213    5178    204    532        4436     6
 3 1956 Q3   227    5297    208    561        4806     7
 4 1956 Q4   308    5681    197    570        4418     6
 5 1957 Q1   262    5577    187    529        4339     5
 6 1957 Q2   228    5651    214    604        4811     7
 7 1957 Q3   236    5317    227    603        5259     7
 8 1957 Q4   320    6152    222    582        4735     6
 9 1958 Q1   272    5758    199    554        4608     5
10 1958 Q2   233    5641    229    620        5196     7
# ℹ 208 more rows

The observations are quarterly.

aus_production |> autoplot(Bricks)

An upward trend is apparent until 1980, after which the number of clay bricks being produced starts to decline. A seasonal pattern is evident in this data. Some sharp drops in some quarters can also be seen.

Lynx

interval(pelt)
<interval[1]>
[1] 1Y

Observations are made once per year.

pelt |> autoplot(Lynx)

Canadian lynx trappings are cyclic, as the extent of peak trappings is unpredictable, and the spacing between the peaks is irregular but approximately 10 years.

Close

gafa_stock
# A tsibble: 5,032 x 8 [!]
# Key:       Symbol [4]
   Symbol Date        Open  High   Low Close Adj_Close    Volume
   <chr>  <date>     <dbl> <dbl> <dbl> <dbl>     <dbl>     <dbl>
 1 AAPL   2014-01-02  79.4  79.6  78.9  79.0      67.0  58671200
 2 AAPL   2014-01-03  79.0  79.1  77.2  77.3      65.5  98116900
 3 AAPL   2014-01-06  76.8  78.1  76.2  77.7      65.9 103152700
 4 AAPL   2014-01-07  77.8  78.0  76.8  77.1      65.4  79302300
 5 AAPL   2014-01-08  77.0  77.9  77.0  77.6      65.8  64632400
 6 AAPL   2014-01-09  78.1  78.1  76.5  76.6      65.0  69787200
 7 AAPL   2014-01-10  77.1  77.3  75.9  76.1      64.5  76244000
 8 AAPL   2014-01-13  75.7  77.5  75.7  76.5      64.9  94623200
 9 AAPL   2014-01-14  76.9  78.1  76.8  78.1      66.1  83140400
10 AAPL   2014-01-15  79.1  80.0  78.8  79.6      67.5  97909700
# ℹ 5,022 more rows

Interval is daily. Looking closer at the data, we can see that the index is a Date variable. It also appears that observations occur only on trading days, creating lots of implicit missing values.

gafa_stock |>
  autoplot(Close)

Stock prices for these technology stocks have risen for most of the series, until mid-late 2018.

The four stocks are on different scales, so they are not directly comparable. A plot with faceting would be better.

gafa_stock |>
  ggplot(aes(x=Date, y=Close, group=Symbol)) +
  geom_line(aes(col=Symbol)) +
  facet_grid(Symbol ~ ., scales='free')

The downturn in the second half of 2018 is now very clear, with Facebook taking a big drop (about 20%) in the middle of the year.

The stocks tend to move roughly together, as you would expect with companies in the same industry.

Demand

vic_elec
# A tsibble: 52,608 x 5 [30m] <Australia/Melbourne>
   Time                Demand Temperature Date       Holiday
   <dttm>               <dbl>       <dbl> <date>     <lgl>  
 1 2012-01-01 00:00:00  4383.        21.4 2012-01-01 TRUE   
 2 2012-01-01 00:30:00  4263.        21.0 2012-01-01 TRUE   
 3 2012-01-01 01:00:00  4049.        20.7 2012-01-01 TRUE   
 4 2012-01-01 01:30:00  3878.        20.6 2012-01-01 TRUE   
 5 2012-01-01 02:00:00  4036.        20.4 2012-01-01 TRUE   
 6 2012-01-01 02:30:00  3866.        20.2 2012-01-01 TRUE   
 7 2012-01-01 03:00:00  3694.        20.1 2012-01-01 TRUE   
 8 2012-01-01 03:30:00  3562.        19.6 2012-01-01 TRUE   
 9 2012-01-01 04:00:00  3433.        19.1 2012-01-01 TRUE   
10 2012-01-01 04:30:00  3359.        19.0 2012-01-01 TRUE   
# ℹ 52,598 more rows

Data is available at 30 minute intervals.

vic_elec |>
  autoplot(Demand)

Appears to have an annual seasonal pattern, where demand is higher during summer and winter. Can’t see much detail, so let’s zoom in.

vic_elec |>
  filter(yearmonth(Time) == yearmonth("2012 June")) |>
  autoplot(Demand)

Appears to have a daily pattern, where less electricity is used overnight. Also appears to have a working day effect (less demand on weekends and holidays).

vic_elec |> autoplot(Demand/1e3) +
  labs(
    x = "Date",
    y = "Demand (GW)",
    title = "Half-hourly electricity demand",
    subtitle = "Victoria, Australia"
  )

Here the annual seasonality is clear, with high volatility in summer, and peaks in summer and winter. The weekly seasonality is also visible, but the daily seasonality is hidden due to the compression on the horizontal axis.

fpp3 2.10, Ex 2

Use filter() to find what days corresponded to the peak closing price for each of the four stocks in gafa_stock.

gafa_stock |>
  group_by(Symbol) |>
  filter(Close == max(Close)) |>
  ungroup() |>
  select(Symbol, Date, Close)
# A tsibble: 4 x 3 [!]
# Key:       Symbol [4]
  Symbol Date       Close
  <chr>  <date>     <dbl>
1 AAPL   2018-10-03  232.
2 AMZN   2018-09-04 2040.
3 FB     2018-07-25  218.
4 GOOG   2018-07-26 1268.

fpp3 2.10, Ex 3

Download the file tute1.csv from the book website, open it in Excel (or some other spreadsheet application), and review its contents. You should find four columns of information. Columns B through D each contain a quarterly series, labelled Sales, AdBudget and GDP. Sales contains the quarterly sales for a small company over the period 1981-2005. AdBudget is the advertising budget and GDP is the gross domestic product. All series have been adjusted for inflation.

download.file("http://OTexts.com/fpp3/extrafiles/tute1.csv",
              tute1 <- tempfile())
tute1 <- readr::read_csv(tute1)
View(tute1)
mytimeseries <- tute1 |>
  mutate(Quarter = yearquarter(Quarter)) |>
  as_tsibble(index = Quarter)

mytimeseries |>
  pivot_longer(-Quarter, names_to="Key", values_to="Value") |>
  ggplot(aes(x = Quarter, y = Value, colour = Key)) +
    geom_line() +
    facet_grid(vars(Key), scales = "free_y")

# Without faceting:
mytimeseries |>
  pivot_longer(-Quarter, names_to="Key", values_to="Value") |>
  ggplot(aes(x = Quarter, y = Value, colour = Key)) +
    geom_line()

fpp3 2.10, Ex 4

The USgas package contains data on the demand for natural gas in the US.

  1. Install the USgas package.
  2. Create a tsibble from us_total with year as the index and state as the key.
  3. Plot the annual natural gas consumption by state for the New England area (comprising the states of Maine, Vermont, New Hampshire, Massachusetts, Connecticut and Rhode Island).
install.packages("USgas", repos = "https://cran.ms.unimelb.edu.au/")
The following package(s) will be installed:
- USgas [0.1.2]
These packages will be installed into "~/git/Teaching/af/renv/library/linux-neon-noble/R-4.4/x86_64-pc-linux-gnu".

# Installing packages --------------------------------------------------------
- Installing USgas ...                          OK [linked from cache]
Successfully installed 1 package in 6.9 milliseconds.
library(USgas)
us_tsibble <- us_total |>
  as_tsibble(index=year, key=state)
# For each state
us_tsibble |>
  filter(state %in% c("Maine", "Vermont", "New Hampshire", "Massachusetts",
                      "Connecticut", "Rhode Island")) |>
  autoplot(y/1e3) +
  labs(y = "billion cubic feet")

fpp3 2.10, Ex 5

  1. Download tourism.xlsx from the book website and read it into R using read_excel() from the readxl package.
  2. Create a tsibble which is identical to the tourism tsibble from the tsibble package.
  3. Find what combination of Region and Purpose had the maximum number of overnight trips on average.
  4. Create a new tsibble which combines the Purposes and Regions, and just has total trips by State.
download.file("http://OTexts.com/fpp3/extrafiles/tourism.xlsx",
              tourism_file <- tempfile(), mode = "wb")
my_tourism <- readxl::read_excel(tourism_file) |>
  mutate(Quarter = yearquarter(Quarter)) |>
  as_tsibble(
    index = Quarter,
    key = c(Region, State, Purpose)
  )
my_tourism
# A tsibble: 24,320 x 5 [1Q]
# Key:       Region, State, Purpose [304]
   Quarter Region   State           Purpose  Trips
     <qtr> <chr>    <chr>           <chr>    <dbl>
 1 1998 Q1 Adelaide South Australia Business  135.
 2 1998 Q2 Adelaide South Australia Business  110.
 3 1998 Q3 Adelaide South Australia Business  166.
 4 1998 Q4 Adelaide South Australia Business  127.
 5 1999 Q1 Adelaide South Australia Business  137.
 6 1999 Q2 Adelaide South Australia Business  200.
 7 1999 Q3 Adelaide South Australia Business  169.
 8 1999 Q4 Adelaide South Australia Business  134.
 9 2000 Q1 Adelaide South Australia Business  154.
10 2000 Q2 Adelaide South Australia Business  169.
# ℹ 24,310 more rows
tourism
# A tsibble: 24,320 x 5 [1Q]
# Key:       Region, State, Purpose [304]
   Quarter Region   State           Purpose  Trips
     <qtr> <chr>    <chr>           <chr>    <dbl>
 1 1998 Q1 Adelaide South Australia Business  135.
 2 1998 Q2 Adelaide South Australia Business  110.
 3 1998 Q3 Adelaide South Australia Business  166.
 4 1998 Q4 Adelaide South Australia Business  127.
 5 1999 Q1 Adelaide South Australia Business  137.
 6 1999 Q2 Adelaide South Australia Business  200.
 7 1999 Q3 Adelaide South Australia Business  169.
 8 1999 Q4 Adelaide South Australia Business  134.
 9 2000 Q1 Adelaide South Australia Business  154.
10 2000 Q2 Adelaide South Australia Business  169.
# ℹ 24,310 more rows
my_tourism |>
  as_tibble() |>
  summarise(Trips = mean(Trips), .by=c(Region, Purpose)) |>
  filter(Trips == max(Trips))
# A tibble: 1 × 3
  Region Purpose  Trips
  <chr>  <chr>    <dbl>
1 Sydney Visiting  747.
state_tourism <- my_tourism |>
  group_by(State) |>
  summarise(Trips = sum(Trips)) |>
  ungroup()
state_tourism
# A tsibble: 640 x 3 [1Q]
# Key:       State [8]
   State Quarter Trips
   <chr>   <qtr> <dbl>
 1 ACT   1998 Q1  551.
 2 ACT   1998 Q2  416.
 3 ACT   1998 Q3  436.
 4 ACT   1998 Q4  450.
 5 ACT   1999 Q1  379.
 6 ACT   1999 Q2  558.
 7 ACT   1999 Q3  449.
 8 ACT   1999 Q4  595.
 9 ACT   2000 Q1  600.
10 ACT   2000 Q2  557.
# ℹ 630 more rows

fpp3 2.10, Ex 6

The aus_arrivals data set comprises quarterly international arrivals (in thousands) to Australia from Japan, New Zealand, UK and the US. Use autoplot(), gg_season() and gg_subseries() to compare the differences between the arrivals from these four countries. Can you identify any unusual observations?

aus_arrivals |> autoplot(Arrivals)

Generally the number of arrivals to Australia is increasing over the entire series, with the exception of Japanese visitors which begin to decline after 1995. The series appear to have a seasonal pattern which varies proportionately to the number of arrivals. Interestingly, the number of visitors from NZ peaks sharply in 1988. The seasonal pattern from Japan appears to change substantially.

#! fig-height: 10
aus_arrivals |> gg_season(Arrivals, labels = "both")

The seasonal pattern of arrivals appears to vary between each country. In particular, arrivals from the UK appears to be lowest in Q2 and Q3, and increase substantially for Q4 and Q1. Whereas for NZ visitors, the lowest period of arrivals is in Q1, and highest in Q3. Similar variations can be seen for Japan and US.

aus_arrivals |> gg_subseries(Arrivals)

The subseries plot reveals more interesting features. It is evident that whilst the UK arrivals is increasing, most of this increase is seasonal. More arrivals are coming during Q1 and Q4, whilst the increase in Q2 and Q3 is less extreme. The growth in arrivals from NZ and US appears fairly similar across all quarters. There exists an unusual spike in arrivals from the US in 1992 Q3.

Unusual observations:

  • 2000 Q3: Spikes from the US (Sydney Olympics arrivals)
  • 2001 Q3-Q4 are unusual for US (9/11 effect)
  • 1991 Q3 is unusual for the US (Gulf war effect?)

fpp3 2.10, Ex 7

Monthly Australian retail data is provided in aus_retail. Select one of the time series as follows (but choose your own seed value):

set.seed(12345678)
myseries <- aus_retail |>
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))

Explore your chosen retail time series using the following functions:

autoplot(), gg_season(), gg_subseries(), gg_lag(), ACF() |> autoplot()
set.seed(12345678)
myseries <- aus_retail |>
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))
myseries |>
  autoplot(Turnover) +
  labs(y = "Turnover (million $AUD)", x = "Time (Years)",
       title = myseries$Industry[1],
       subtitle = myseries$State[1])

The data features a non-linear upward trend and a strong seasonal pattern. The variability in the data appears proportional to the amount of turnover (level of the series) over the time period.

myseries |>
  gg_season(Turnover, labels = "both") +
  labs(y = "Turnover (million $AUD)",
       title = myseries$Industry[1],
       subtitle = myseries$State[1])

Strong seasonality is evident in the season plot. Large increases in clothing retailing can be observed in December (probably a Christmas effect). There is also a peak in July that appears to be getting stronger over time. 2016 had an unusual pattern in the first half of the year.

myseries |>
  gg_subseries(Turnover) +
  labs(y = "Turnover (million $AUD)", x="")

There is a strong trend in all months, with the largest trend in December and a larger increase in July and August than most other months.

myseries |>
  gg_lag(Turnover, lags=1:24, geom='point') + facet_wrap(~ .lag, ncol=6)

myseries |>
  ACF(Turnover, lag_max = 50) |>
  autoplot()

fpp3 2.10, Ex 8

Use the following graphics functions: autoplot(), gg_season(), gg_subseries(), gg_lag(), ACF() and explore features from the following time series: “Total Private” Employed from us_employment, Bricks from aus_production, Hare from pelt, “H02” Cost from PBS, and us_gasoline.

  • Can you spot any seasonality, cyclicity and trend?
  • What do you learn about the series?
  • What can you say about the seasonal patterns?
  • Can you identify any unusual years?

Total Private Employment in the US

us_employment |>
  filter(Title == "Total Private") |>
  autoplot(Employed)

There is a strong trend and seasonality. Some cyclic behaviour is seen, with a big drop due to the global financial crisis.

us_employment |>
  filter(Title == "Total Private") |>
  gg_season(Employed)

us_employment |>
  filter(Title == "Total Private") |>
  gg_subseries(Employed)

us_employment |>
  filter(Title == "Total Private") |>
  gg_lag(Employed)

us_employment |>
  filter(Title == "Total Private") |>
  ACF(Employed) |>
  autoplot()

In all of these plots, the trend is so dominant that it is hard to see anything else. We need to remove the trend so we can explore the other features of the data.

Brick production in Australia

aus_production |>
  autoplot(Bricks)

A positive trend in the first 20 years, and a negative trend in the next 25 years. Strong quarterly seasonality, with some cyclicity – note the recessions in the 1970s and 1980s.

aus_production |>
  gg_season(Bricks)

Brick production tends to be lowest in the first quarter and peak in either quarter 2 or quarter 3.

aus_production |>
  gg_subseries(Bricks)

The decrease in the last 25 years has been weakest in Q1.

aus_production |>
  gg_lag(Bricks, geom='point')

aus_production |>
  ACF(Bricks) |> autoplot()

The seasonality shows up as peaks at lags 4, 8, 12, 16, 20, …. The trend is seen with the slow decline on the positive side.

Snow hare trappings in Canada

pelt |>
  autoplot(Hare)

There is some cyclic behaviour with substantial variation in the length of the period.

pelt |>
  gg_lag(Hare, geom='point')

pelt |>
  ACF(Hare) |> autoplot()

The cyclic period seems to have an average of about 10 (due to the local maximum in ACF at lag 10).

H02 sales in Australia

There are four series corresponding to H02 sales, so we will add them together.

h02 <- PBS |>
  filter(ATC2 == "H02") |>
  group_by(ATC2) |>
  summarise(Cost = sum(Cost)) |>
  ungroup()
h02 |>
  autoplot(Cost)

A positive trend with strong monthly seasonality, dropping suddenly every February.

h02 |>
  gg_season(Cost)

h02 |>
  gg_subseries(Cost)

The trends have been greater in the higher peaking months – this leads to increasing seasonal variation.

h02 |>
  gg_lag(Cost, geom='point', lags=1:16)

h02 |>
  ACF(Cost) |> autoplot()

The large January sales show up as a separate cluster of points in the lag plots. The strong seasonality is clear in the ACF plot.

US gasoline sales

us_gasoline |>
  autoplot(Barrels)

A positive trend until 2008, and then the global financial crisis led to a drop in sales until 2012. The shape of the seasonality seems to have changed over time.

us_gasoline |>
  gg_season(Barrels)

There is a lot of noise making it hard to see the overall seasonal pattern. However, it seems to drop towards the end of quarter 4.

us_gasoline |>
  gg_subseries(Barrels)

The blue lines are helpful in seeing the average seasonal pattern.

us_gasoline |>
  gg_lag(Barrels, geom='point', lags=1:16)

us_gasoline |>
  ACF(Barrels, lag_max = 150) |> autoplot()

The seasonality is seen if we increase the lags to at least 2 years (approx 104 weeks)

fpp3 2.10, Ex 9

The following time plots and ACF plots correspond to four different time series. Your task is to match each time plot in the first row with one of the ACF plots in the second row.

1-B, 2-A, 3-D, 4-C

fpp3 2.10, Ex 10

The aus_livestock data contains the monthly total number of pigs slaughtered in Victoria, Australia, from Jul 1972 to Dec 2018. Use filter() to extract pig slaughters in Victoria between 1990 and 1995. Use autoplot and ACF for this data. How do they differ from white noise? If a longer period of data is used, what difference does it make to the ACF?

vic_pigs <- aus_livestock |>
  filter(Animal == "Pigs", State == "Victoria", between(year(Month), 1990, 1995))
vic_pigs
# A tsibble: 72 x 4 [1M]
# Key:       Animal, State [1]
      Month Animal State    Count
      <mth> <fct>  <fct>    <dbl>
 1 1990 Jan Pigs   Victoria 76000
 2 1990 Feb Pigs   Victoria 78100
 3 1990 Mar Pigs   Victoria 77600
 4 1990 Apr Pigs   Victoria 84100
 5 1990 May Pigs   Victoria 98000
 6 1990 Jun Pigs   Victoria 89100
 7 1990 Jul Pigs   Victoria 93500
 8 1990 Aug Pigs   Victoria 84700
 9 1990 Sep Pigs   Victoria 74500
10 1990 Oct Pigs   Victoria 91900
# ℹ 62 more rows
vic_pigs |>
  autoplot(Count)

Although the values appear to vary erratically between months, a general upward trend is evident between 1990 and 1995. In contrast, a white noise plot does not exhibit any trend.

vic_pigs |> ACF(Count) |> autoplot()

The first 14 lags are significant, as the ACF slowly decays. This suggests that the data contains a trend. A white noise ACF plot would not usually contain any significant lags. The large spike at lag 12 suggests there is some seasonality in the data.

aus_livestock |>
  filter(Animal == "Pigs", State == "Victoria") |>
  ACF(Count) |>
  autoplot()

The longer series has much larger autocorrelations, plus clear evidence of seasonality at the seasonal lags of 12, 24, \dots.

fpp3 2.10, Ex 11

  1. Use the following code to compute the daily changes in Google closing stock prices.

    dgoog <- gafa_stock |>
      filter(Symbol == "GOOG", year(Date) >= 2018) |>
      mutate(trading_day = row_number()) |>
      update_tsibble(index = trading_day, regular = TRUE) |>
      mutate(diff = difference(Close))
  2. Why was it necessary to re-index the tsibble?

  3. Plot these differences and their ACF.

  4. Do the changes in the stock prices look like white noise?

dgoog <- gafa_stock |>
  filter(Symbol == "GOOG", year(Date) >= 2018) |>
  mutate(trading_day = row_number()) |>
  update_tsibble(index = trading_day, regular = TRUE) |>
  mutate(diff = difference(Close))

The tsibble needed re-indexing as trading happens irregularly. The new index is based only on trading days.

dgoog |> autoplot(diff)

dgoog |> ACF(diff, lag_max=100) |> autoplot()

There are some small significant autocorrelations out to lag 24, but nothing after that. Given the probability of a false positive is 5%, these look similar to white noise.