Time Series in R: Forecasting and Visualisation Time series in R 29 May 2017 1
Outline 1 ts objects 2 Time plots 3 Lab session 1 4 Seasonal plots 5 Seasonal or cyclic? 6 Lag plots and autocorrelation 7 Lab session 2 2
Time series Time series consist of sequences of observations collected over time. We will assume the time periods are equally spaced. Time series examples Daily IBM stock prices Monthly rainfall Annual Google profits Quarterly Australian beer production 3
ts objects and ts function A time series is stored in a ts object in R: a list of numbers information about times those numbers were recorded. Example Year Observation 2012 123 2013 39 2014 78 2015 52 2016 110 y <- ts(c(123,39,78,52,110), start=2012) 4
ts objects and ts function For observations that are more frequent than once per year, add a frequency argument. E.g., monthly data stored as a numerical vector z: y <- ts(z, frequency=12, start=c(2003, 1)) 5
ts objects and ts function ts(data, frequency, start) Type of data frequency start example Annual 1 1995 Quarterly 4 c(1995,2) Monthly 12 c(1995,9) Daily 7 or 365.25 1 or c(1995,234) Weekly 52.18 c(1995,23) Hourly 24 or 168 or 8,766 1 Half-hourly 48 or 336 or 17,532 1 6
ts objects Class: ts Print and plotting methods available. ausgdp ## Qtr1 Qtr2 Qtr3 Qtr4 ## 1971 4612 4651 ## 1972 4645 4615 4645 4722 ## 1973 4780 4830 4887 4933 ## 1974 4921 4875 4867 4905 ## 1975 4938 4934 4942 4979 ## 1976 5028 5079 5112 5127 7
ts objects start(ausgdp) ## [1] 1971 3 end(ausgdp) ## [1] 1998 1 frequency(ausgdp) ## [1] 4 8
ts objects Residential electricity sales elecsales ## Time Series: ## Start = 1989 ## End = 2008 ## Frequency = 1 ## [1] 2354 2380 2319 2469 2386 2569 2576 2763 2844 ## [10] 3001 3108 3358 3076 3181 3222 3176 3431 3527 ## [19] 3638 3655 9
ts objects start(elecsales) ## [1] 1989 1 end(elecsales) ## [1] 2008 1 frequency(elecsales) ## [1] 1 10
fpp2 Main package used in this course > library(fpp2) This loads: some data for use in examples and exercises forecast package (for forecasting functions) ggplot2 package (for graphics) fma package (for lots of time series data) expsmooth package (for more time series data) 11
Outline 1 ts objects 2 Time plots 3 Lab session 1 4 Seasonal plots 5 Seasonal or cyclic? 6 Lag plots and autocorrelation 7 Lab session 2 12
ts objects autoplot(ausgdp) 7000 ausgdp 6000 5000 1975 1980 1985 1990 1995 Time 13
Time plots autoplot(a10) + ylab("$ million") + xlab("year") + ggtitle("antidiabetic drug sales") 30 Antidiabetic drug sales 20 $ million 10 1995 2000 2005 Year 14
Outline 1 ts objects 2 Time plots 3 Lab session 1 4 Seasonal plots 5 Seasonal or cyclic? 6 Lag plots and autocorrelation 7 Lab session 2 15
Lab Session 1 16
Outline 1 ts objects 2 Time plots 3 Lab session 1 4 Seasonal plots 5 Seasonal or cyclic? 6 Lag plots and autocorrelation 7 Lab session 2 17
Time plot autoplot(a10) + ylab("$ million") + xlab("year") + ggtitle("antidiabetic drug sales") 30 Antidiabetic drug sales 20 $ million 10 1995 2000 2005 Year 18
Seasonal plot ggseasonplot(a10, year.labels=true,year.labels.left=true) + ylab("$ million") + ggtitle("seasonal plot: antidiabetic drug sales") Seasonal plot: antidiabetic drug sales $ million 30 20 10 2008 2007 2007 2006 2006 2005 2008 2005 2004 2004 2002 2003 2003 2001 2002 2000 2001 1999 2000 1999 1997 1998 1996 1997 1998 1996 1993 1993 1994 1995 1995 1992 1992 1994 1991 1991 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Month 19
Seasonal polar plots ggseasonplot(a10, polar=true) + ylab("$ million") Seasonal plot: a10 Jan/ Dec Feb year 1991 20 Nov Mar 1992 1993 1994 1995 10 1996 1997 $ million Oct Apr 1998 1999 2000 2001 2002 2003 Sep May 2004 2005 2006 2007 2008 Aug Jun Jul 20
Seasonal subseries plots ggsubseriesplot(a10) + ylab("$ million") + ggtitle("subseries plot: antidiabetic drug sales") 30 Subseries plot: antidiabetic drug sales 20 $ million 10 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Month 21
Quarterly Australian Beer Production beer <- window(ausbeer,start=1992) autoplot(beer) 500 beer 450 400 1995 2000 2005 2010 Time 22
Quarterly Australian Beer Production ggseasonplot(beer,year.labels=true) Seasonal plot: beer 500 450 1992 1994 1997 1995 1998 1999 1993 1996 2002 2000 2001 2003 2006 2008 2009 2005 2007 2004 400 2010 Q1 Q2 Q3 Q4 Quarter 23
Quarterly Australian Beer Production ggsubseriesplot(beer) 500 beer 450 400 Q1 Q2 Q3 Q4 Quarter 24
Outline 1 ts objects 2 Time plots 3 Lab session 1 4 Seasonal plots 5 Seasonal or cyclic? 6 Lag plots and autocorrelation 7 Lab session 2 25
Time series patterns Trend pattern exists when there is a long-term increase or decrease in the data. Seasonal pattern exists when a series is influenced by seasonal factors (e.g., the quarter of the year, the month, or day of the week). Cyclic pattern exists when data exhibit rises and falls that are not of fixed period (duration usually of at least 2 years). 26
Time series patterns autoplot(window(elec, start=1980)) + ggtitle("australian electricity production") + xlab("year") + ylab("gwh") Australian electricity production 14000 GWh 12000 10000 8000 1980 1985 1990 1995 Year 27
Time series patterns autoplot(bricksq) + ggtitle("australian clay brick production") + xlab("year") + ylab("million units") 600 Australian clay brick production 500 million units 400 300 200 1960 1970 1980 1990 Year 28
Time series patterns autoplot(ustreas) + ggtitle("us Treasury Bill Contracts") + xlab("day") + ylab("price") US Treasury Bill Contracts 90 price 88 86 0 20 40 60 80 100 Day 29
Time series patterns autoplot(lynx) + ggtitle("annual Canadian Lynx Trappings") + xlab("year") + ylab("number trapped") Annual Canadian Lynx Trappings 6000 Number trapped 4000 2000 0 1820 1840 1860 1880 1900 1920 Year 30
Seasonal or cyclic? Differences between seasonal and cyclic patterns: seasonal pattern constant length; cyclic pattern variable length average length of cycle longer than length of seasonal pattern magnitude of cycle more variable than magnitude of seasonal pattern 31
Seasonal or cyclic? Differences between seasonal and cyclic patterns: seasonal pattern constant length; cyclic pattern variable length average length of cycle longer than length of seasonal pattern magnitude of cycle more variable than magnitude of seasonal pattern The timing of peaks and troughs is predictable with seasonal data, but unpredictable in the long term with cyclic data. 31
Outline 1 ts objects 2 Time plots 3 Lab session 1 4 Seasonal plots 5 Seasonal or cyclic? 6 Lag plots and autocorrelation 7 Lab session 2 32
Example: Beer production beer <- window(ausbeer, start=1992) gglagplot(beer) 33
Example: Beer production lag 1 lag 2 lag 3 500 450 400 lag 4 lag 5 lag 6 500 450 400 Quarter 1 2 3 4 lag 7 lag 8 lag 9 500 450 400 400 450 500 400 450 500 400 450 500 34
Lagged scatterplots Each graph shows y t plotted against y t k for different values of k. The autocorrelations are the correlations associated with these scatterplots. 35
Autocorrelation Results for first 9 lags for beer data: r 1 r 2 r 3 r 4 r 5 r 6 r 7 r 8 r 9-0.102-0.657-0.060 0.869-0.089-0.635-0.054 0.832-0.108 ggacf(beer) Series: beer 0.5 ACF 0.0 0.5 4 8 12 16 Lag 36
Autocorrelation r 4 higher than for the other lags. This is due to the seasonal pattern in the data: the peaks tend to be 4 quarters apart and the troughs tend to be 2 quarters apart. r 2 is more negative than for the other lags because troughs tend to be 2 quarters behind peaks. Together, the autocorrelations at lags 1, 2,..., make up the autocorrelation or ACF. The plot is known as a correlogram 37
Trend and seasonality in ACF plots When data have a trend, the autocorrelations for small lags tend to be large and positive. When data are seasonal, the autocorrelations will be larger at the seasonal lags (i.e., at multiples of the seasonal frequency) When data are trended and seasonal, you see a combination of these effects. 38
Aus monthly electricity production elec2 <- window(elec, start=1980) autoplot(elec2) 14000 elec2 12000 10000 8000 1980 1985 1990 1995 Time 39
Aus monthly electricity production ggacf(elec2, lag.max=48) Series: elec2 0.75 0.50 ACF 0.25 0.00 0 12 24 36 48 Lag 40
Google stock price autoplot(goog) 800 700 goog 600 500 400 0 200 400 600 800 1000 Time 41
Google stock price ggacf(goog, lag.max=100) 1.00 Series: goog 0.75 ACF 0.50 0.25 0.00 0 20 40 60 80 100 Lag 42
Which is which? 1. Daily temperature of cow 2. Monthly accidental deaths 3. Monthly air passengers 4. Annual mink trappings 11 600 chirps per minute 80 60 thousands 10 9 8 thousands 400 200 thousands 90 60 30 40 7 0 20 40 60 1974 1976 1978 1950 1952 1954 1956 1958 1960 1860 1880 1900 A B C D 1.0 1.0 1.0 1.0 0.5 0.5 0.5 0.5 ACF ACF ACF ACF 0.0 0.0 0.0 0.0 6 12 18 24 5 10 15 5 10 15 6 12 18 24 43
Outline 1 ts objects 2 Time plots 3 Lab session 1 4 Seasonal plots 5 Seasonal or cyclic? 6 Lag plots and autocorrelation 7 Lab session 2 44
Lab Session 2 45