library(fpp3)
Exercise Week 2: Solutions
fpp3 2.10, Ex 1
Explore the following four time series:
Bricks
fromaus_production
,Lynx
frompelt
,Close
fromgafa_stock
,Demand
fromvic_elec
.
- Use
?
(orhelp()
) to find out about the data in each series.- What is the time interval of each series?
- Use
autoplot()
to produce a time plot of each series.- For the last plot, modify the axis labels and title.
Bricks
aus_production
# A tsibble: 218 x 7 [1Q]
Quarter Beer Tobacco Bricks Cement Electricity Gas
<qtr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1956 Q1 284 5225 189 465 3923 5
2 1956 Q2 213 5178 204 532 4436 6
3 1956 Q3 227 5297 208 561 4806 7
4 1956 Q4 308 5681 197 570 4418 6
5 1957 Q1 262 5577 187 529 4339 5
6 1957 Q2 228 5651 214 604 4811 7
7 1957 Q3 236 5317 227 603 5259 7
8 1957 Q4 320 6152 222 582 4735 6
9 1958 Q1 272 5758 199 554 4608 5
10 1958 Q2 233 5641 229 620 5196 7
# ℹ 208 more rows
The observations are quarterly.
|> autoplot(Bricks) aus_production
An upward trend is apparent until 1980, after which the number of clay bricks being produced starts to decline. A seasonal pattern is evident in this data. Some sharp drops in some quarters can also be seen.
Lynx
interval(pelt)
<interval[1]>
[1] 1Y
Observations are made once per year.
|> autoplot(Lynx) pelt
Canadian lynx trappings are cyclic, as the extent of peak trappings is unpredictable, and the spacing between the peaks is irregular but approximately 10 years.
Close
gafa_stock
# A tsibble: 5,032 x 8 [!]
# Key: Symbol [4]
Symbol Date Open High Low Close Adj_Close Volume
<chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 AAPL 2014-01-02 79.4 79.6 78.9 79.0 67.0 58671200
2 AAPL 2014-01-03 79.0 79.1 77.2 77.3 65.5 98116900
3 AAPL 2014-01-06 76.8 78.1 76.2 77.7 65.9 103152700
4 AAPL 2014-01-07 77.8 78.0 76.8 77.1 65.4 79302300
5 AAPL 2014-01-08 77.0 77.9 77.0 77.6 65.8 64632400
6 AAPL 2014-01-09 78.1 78.1 76.5 76.6 65.0 69787200
7 AAPL 2014-01-10 77.1 77.3 75.9 76.1 64.5 76244000
8 AAPL 2014-01-13 75.7 77.5 75.7 76.5 64.9 94623200
9 AAPL 2014-01-14 76.9 78.1 76.8 78.1 66.1 83140400
10 AAPL 2014-01-15 79.1 80.0 78.8 79.6 67.5 97909700
# ℹ 5,022 more rows
Interval is daily. Looking closer at the data, we can see that the index is a Date variable. It also appears that observations occur only on trading days, creating lots of implicit missing values.
|>
gafa_stock autoplot(Close)
Stock prices for these technology stocks have risen for most of the series, until mid-late 2018.
The four stocks are on different scales, so they are not directly comparable. A plot with faceting would be better.
|>
gafa_stock ggplot(aes(x=Date, y=Close, group=Symbol)) +
geom_line(aes(col=Symbol)) +
facet_grid(Symbol ~ ., scales='free')
The downturn in the second half of 2018 is now very clear, with Facebook taking a big drop (about 20%) in the middle of the year.
The stocks tend to move roughly together, as you would expect with companies in the same industry.
Demand
vic_elec
# A tsibble: 52,608 x 5 [30m] <Australia/Melbourne>
Time Demand Temperature Date Holiday
<dttm> <dbl> <dbl> <date> <lgl>
1 2012-01-01 00:00:00 4383. 21.4 2012-01-01 TRUE
2 2012-01-01 00:30:00 4263. 21.0 2012-01-01 TRUE
3 2012-01-01 01:00:00 4049. 20.7 2012-01-01 TRUE
4 2012-01-01 01:30:00 3878. 20.6 2012-01-01 TRUE
5 2012-01-01 02:00:00 4036. 20.4 2012-01-01 TRUE
6 2012-01-01 02:30:00 3866. 20.2 2012-01-01 TRUE
7 2012-01-01 03:00:00 3694. 20.1 2012-01-01 TRUE
8 2012-01-01 03:30:00 3562. 19.6 2012-01-01 TRUE
9 2012-01-01 04:00:00 3433. 19.1 2012-01-01 TRUE
10 2012-01-01 04:30:00 3359. 19.0 2012-01-01 TRUE
# ℹ 52,598 more rows
Data is available at 30 minute intervals.
|>
vic_elec autoplot(Demand)
Appears to have an annual seasonal pattern, where demand is higher during summer and winter. Can’t see much detail, so let’s zoom in.
|>
vic_elec filter(yearmonth(Time) == yearmonth("2012 June")) |>
autoplot(Demand)
Appears to have a daily pattern, where less electricity is used overnight. Also appears to have a working day effect (less demand on weekends and holidays).
|> autoplot(Demand/1e3) +
vic_elec labs(
x = "Date",
y = "Demand (GW)",
title = "Half-hourly electricity demand",
subtitle = "Victoria, Australia"
)
Here the annual seasonality is clear, with high volatility in summer, and peaks in summer and winter. The weekly seasonality is also visible, but the daily seasonality is hidden due to the compression on the horizontal axis.
fpp3 2.10, Ex 2
Use
filter()
to find what days corresponded to the peak closing price for each of the four stocks ingafa_stock
.
|>
gafa_stock group_by(Symbol) |>
filter(Close == max(Close)) |>
ungroup() |>
select(Symbol, Date, Close)
# A tsibble: 4 x 3 [!]
# Key: Symbol [4]
Symbol Date Close
<chr> <date> <dbl>
1 AAPL 2018-10-03 232.
2 AMZN 2018-09-04 2040.
3 FB 2018-07-25 218.
4 GOOG 2018-07-26 1268.
fpp3 2.10, Ex 3
Download the file
tute1.csv
from the book website, open it in Excel (or some other spreadsheet application), and review its contents. You should find four columns of information. Columns B through D each contain a quarterly series, labelled Sales, AdBudget and GDP. Sales contains the quarterly sales for a small company over the period 1981-2005. AdBudget is the advertising budget and GDP is the gross domestic product. All series have been adjusted for inflation.
download.file("http://OTexts.com/fpp3/extrafiles/tute1.csv",
<- tempfile())
tute1 <- readr::read_csv(tute1)
tute1 tute1
# A tibble: 100 × 4
Quarter Sales AdBudget GDP
<date> <dbl> <dbl> <dbl>
1 1981-03-01 1020. 659. 252.
2 1981-06-01 889. 589 291.
3 1981-09-01 795 512. 291.
4 1981-12-01 1004. 614. 292.
5 1982-03-01 1058. 647. 279.
6 1982-06-01 944. 602 254
7 1982-09-01 778. 531. 296.
8 1982-12-01 932. 608. 272.
9 1983-03-01 996. 638. 260.
10 1983-06-01 908. 582. 280.
# ℹ 90 more rows
<- tute1 |>
mytimeseries mutate(Quarter = yearquarter(Quarter)) |>
as_tsibble(index = Quarter)
|>
mytimeseries pivot_longer(-Quarter, names_to="Key", values_to="Value") |>
ggplot(aes(x = Quarter, y = Value, colour = Key)) +
geom_line() +
facet_grid(vars(Key), scales = "free_y")
# Without faceting:
|>
mytimeseries pivot_longer(-Quarter, names_to="Key", values_to="Value") |>
ggplot(aes(x = Quarter, y = Value, colour = Key)) +
geom_line()
fpp3 2.10, Ex 4
The
USgas
package contains data on the demand for natural gas in the US.
- Install the
USgas
package.- Create a tsibble from
us_total
with year as the index and state as the key.- Plot the annual natural gas consumption by state for the New England area (comprising the states of Maine, Vermont, New Hampshire, Massachusetts, Connecticut and Rhode Island).
install.packages("USgas")
The following package(s) will be installed:
- USgas [0.1.2]
These packages will be installed into "~/git/Teaching/af/renv/library/linux-neon-jammy/R-4.4/x86_64-pc-linux-gnu".
# Installing packages --------------------------------------------------------
- Installing USgas ... OK [linked from cache]
Successfully installed 1 package in 14 milliseconds.
library(USgas)
<- us_total |>
us_tsibble as_tsibble(index=year, key=state)
# For each state
|>
us_tsibble filter(state %in% c("Maine", "Vermont", "New Hampshire", "Massachusetts",
"Connecticut", "Rhode Island")) |>
autoplot(y/1e3) +
labs(y = "billion cubic feet")
fpp3 2.10, Ex 5
- Download
tourism.xlsx
from the book website and read it into R usingread_excel()
from thereadxl
package.- Create a tsibble which is identical to the
tourism
tsibble from thetsibble
package.- Find what combination of
Region
andPurpose
had the maximum number of overnight trips on average.- Create a new tsibble which combines the Purposes and Regions, and just has total trips by State.
download.file("http://OTexts.com/fpp3/extrafiles/tourism.xlsx",
<- tempfile(), mode = "wb")
tourism_file <- readxl::read_excel(tourism_file) |>
my_tourism mutate(Quarter = yearquarter(Quarter)) |>
as_tsibble(
index = Quarter,
key = c(Region, State, Purpose)
) my_tourism
# A tsibble: 24,320 x 5 [1Q]
# Key: Region, State, Purpose [304]
Quarter Region State Purpose Trips
<qtr> <chr> <chr> <chr> <dbl>
1 1998 Q1 Adelaide South Australia Business 135.
2 1998 Q2 Adelaide South Australia Business 110.
3 1998 Q3 Adelaide South Australia Business 166.
4 1998 Q4 Adelaide South Australia Business 127.
5 1999 Q1 Adelaide South Australia Business 137.
6 1999 Q2 Adelaide South Australia Business 200.
7 1999 Q3 Adelaide South Australia Business 169.
8 1999 Q4 Adelaide South Australia Business 134.
9 2000 Q1 Adelaide South Australia Business 154.
10 2000 Q2 Adelaide South Australia Business 169.
# ℹ 24,310 more rows
tourism
# A tsibble: 24,320 x 5 [1Q]
# Key: Region, State, Purpose [304]
Quarter Region State Purpose Trips
<qtr> <chr> <chr> <chr> <dbl>
1 1998 Q1 Adelaide South Australia Business 135.
2 1998 Q2 Adelaide South Australia Business 110.
3 1998 Q3 Adelaide South Australia Business 166.
4 1998 Q4 Adelaide South Australia Business 127.
5 1999 Q1 Adelaide South Australia Business 137.
6 1999 Q2 Adelaide South Australia Business 200.
7 1999 Q3 Adelaide South Australia Business 169.
8 1999 Q4 Adelaide South Australia Business 134.
9 2000 Q1 Adelaide South Australia Business 154.
10 2000 Q2 Adelaide South Australia Business 169.
# ℹ 24,310 more rows
|>
my_tourism as_tibble() |>
summarise(Trips = mean(Trips), .by=c(Region, Purpose)) |>
filter(Trips == max(Trips))
# A tibble: 1 × 3
Region Purpose Trips
<chr> <chr> <dbl>
1 Sydney Visiting 747.
<- my_tourism |>
state_tourism group_by(State) |>
summarise(Trips = sum(Trips)) |>
ungroup()
state_tourism
# A tsibble: 640 x 3 [1Q]
# Key: State [8]
State Quarter Trips
<chr> <qtr> <dbl>
1 ACT 1998 Q1 551.
2 ACT 1998 Q2 416.
3 ACT 1998 Q3 436.
4 ACT 1998 Q4 450.
5 ACT 1999 Q1 379.
6 ACT 1999 Q2 558.
7 ACT 1999 Q3 449.
8 ACT 1999 Q4 595.
9 ACT 2000 Q1 600.
10 ACT 2000 Q2 557.
# ℹ 630 more rows