Forecasting electricity generation - ARIMA (R)
Aug 2021 by Francisco Juretig
In this example, we will use the famous
auto.arima() function to select the best ARIMA model for us.
The objective will be to predict the daily electricity generation in GWh for a company in argentina during the previous years.
The ARIMA approach consists of :
- removing the trend from the series
- analyzing whether the series is stationary. If not stationary, we difference the series
- plotting the autocorrelation and partial autocorrelation function for the residuals
- proposing an AR, MA or ARIMA model
- checking that the residuals of the ARIMA model are stationary, and have no structure
- combining the trend + the ARIMA estimated model to make predictions
This usually involves a lot of manual work, and also creates a lot of difficulty in choosing the best model (steps 4-5). The
auto.arima function from the
forecast package is meant to do this automatically for us. Conceptually, we just need to provide a time series and auto.arima will choose the best model for us.
Here we will use prython as our IDE since it allows us to separate our code in a nice way. It is a new R and Python IDE that allows you to put your code in panels that you can connect to each other. Inside each panel you can use any R or python code you would normally use
Here we just load the data. As you can see, it gets automatically shown in a table next to the panel.
How do we run it? Each panel has three running modes. You can see the blue carets next to the python button (this button is used to switch between R and python). The first running mode runs only one panel, the second one runs up to that panel (meaning everything that is connected to IN) and the third one runs everything that is connected to OUT (meaning everything that uses the objects defined in here).
This is actually a bigger project meant to show several things for ARIMA models. Let's focus on the branch shown here. In this branch (here under the auto.arima approach title) we have three panels (two shown here). We use the decompose function to decompose the series into a trend, a seasonal component, and a random component. Ideally, we would like to see a linear trend: here we have a nonlinear one which probably shows there are unobserved components here.
The second panel shows the auto-arima approach. We literally just call the
auto.arima() function. And then, we can predict very easily.
plot function plots the forecast along with the confidence intervals. Pay attention to the output printed here: the best model is an ARIMA with AR=2,MA=0,I=0, and the seasonal part with SAR=2.
The drift is the trend that was automatically estimated.
We finally check if the residuals have no remaining structure. This is easy to evaluate. We call the checkresiduals() function to do a Ljung-Box test (the null hypothesis is that all the residuals are zero). As you can see here, we don't reject the null hypothesis. This means that we can conclude there is no structure.
Prefer a video?
Here we use the auto.arima function, and we also do a manual ARIMA model
You will also find the code, project and files that you can download from a github repo.https://github.com/fjuretig/amazing_data_science_projects.git