The FTSE 100 and financial time series

12 January 2010 | 0 comments | Tagged as: ARIMA neural-networks

What are Share Indices?

A Share Index is an average that lists the leading companies in a market based on their market capitalisation. Market capitalisation is a calculation of the companies share price multiplied by the number of its shares. This calculation is used as a weighting system, whereby movements in the share price of larger companies will have a greater effect on the index than movements in smaller companies.

The popularity of investing directly in these indices has increased over recent years. Share Indices provide a useful means for allowing individuals or organisations to invest in the overall market movement rather than take the higher risk of selecting individual securities.

The companies that make up these indices come from all over the business spectrum. As a result of this diversity and also due to additional influences exerted by other markets, there are a great many factors that can affect their value. This makes the task of forecasting their movements an extremely difficult one. Even so, much time and effort is spent by institutions and investors analysing their behaviour.

The FTSE 100 index

The first index to be calculated on the market traded at the London Stock Exchange was known as the Financial Times Ordinary Share Index. This began in 1935 and was based on the top 30 companies.

As time went on, the 30 Ordinary Share Index became an inadequate measure, so the Financial Times Stock Exchange 100 Index was formed in 1984, increasing the number of companies listed to 100. The index began at the level of 1000 and represent 77% of the capitalisation of the whole market. It is calculated every fifteen seconds from 8.30am to 4.30pm.

The figure shows how the index closing price has fluctuated between 26/09/2001 and 02/07/2004.

Financial Time Series

Financial Time Series Analysis

As the value of Share Indices rise and fall over time, we can view this price change/time combination as a financial time series’. A time series can be defined as a collection of observations made sequentially through time. It is said to be continuous when observations are made continuously through time and discrete when observations are taken at specific times, usually equally spaced.

These definitions can be applied to the movements of the FTSE 100 closing price over time and available techniques for analysis and prediction are therefore applicable. Traditional methods of analysis are primarily concerned with decomposing the variations in the time series into components representing long term trends, seasonal and other cyclical variations. The figure below shows an example of seasonal and long term variations.

Seasonal

After these variations have been removed, usually additional fluctuations remain. Whether or not these are random however is a contentious issue.

Fundamental Analysts study economic and political features, everything that make prices what they are. They believe that there are far too many random influencing factors so these fluctuations must be down to the price following a random walk. In other words, the price at time t equals the price at t-1 plus some random element.

Technical Analysts on the other hand are more concerned with the study of the market itself and trends in the price and traded volume. They have techniques such as Autoregressive (AR) and Moving Averages (MA) modelling to try to find additional shorter trends in this remaining data.

AR refers to using the past data to self predict and MA refers to concept of smoothing the data by using an average of the past n days.

A Traditional Methodology for Forecasting Time Series

Box and Jenkins developed a systematic approach for identifying forecasting models which incorporated both techniques of AR and MA. Known as Autoregressive Integrated Moving Average (ARIMA), it is made up of three stages, Model Identification, Estimation and Validation.

Model Identification

The first step in the process is to determine if the series is stationary and/or seasonal. A series is said to be stationary if the mean, variance and autocorrelation remain constant over time. This means that the series is flat and not trending. A common way to make a series stationary is to take differences. The table below shows a sample of Index values together with their first difference.

Date Index Value 1st Diff Date Index Value 1st Diff
10/02/2004 4404.95 * 22/04/2004 4571.83 31.963
11/02/2004 4396.05 -8.900 23/04/2004 4569.95 -1.876
12/02/2004 4377.73 -18.319 26/04/2004 4571.85 1.891
13/02/2004 4412.01 34.283 27/04/2004 4575.68 3.838
16/02/2004 4408.12 -3.894 28/04/2004 4524.48 -51.204
17/02/2004 4461.49 53.375 29/04/2004 4519.53 -4.947
18/02/2004 4442.90 -18.591 30/04/2004 4489.69 -29.844
19/02/2004 4515.57 72.663 04/05/2004 4547.23 57.545
20/02/2004 4515.04 -0.523 05/05/2004 4569.53 22.298
23/02/2004 4524.31 9.271 06/05/2004 4516.17 -53.362
24/02/2004 4496.76 -27.551 07/05/2004 4498.37 -17.798
25/02/2004 4507.55 10.783 10/05/2004 4395.16 -103.210
26/02/2004 4515.89 8.341 11/05/2004 4454.72 59.564
27/02/2004 4492.21 -23.672 12/05/2004 4412.93 -41.794
01/03/2004 4537.00 44.789 13/05/2004 4453.81 40.880
02/03/2004 4540.11 3.111 14/05/2004 4441.79 -12.019
03/03/2004 4525.13 -14.982 17/05/2004 4403.02 -38.771
04/03/2004 4559.07 33.939 18/05/2004 4414.41 11.391
05/03/2004 4547.08 -11.990 19/05/2004 4471.80 57.394
08/03/2004 4553.75 6.670 20/05/2004 4428.71 -43.094
09/03/2004 4542.01 -11.744 21/05/2004 4431.43 2.724
10/03/2004 4545.33 3.327 24/05/2004 4428.87 -2.561
11/03/2004 4445.22 -100.115 25/05/2004 4418.00 -10.872
12/03/2004 4467.35 22.130 26/05/2004 4438.29 20.283
15/03/2004 4412.93 -54.418 27/05/2004 4453.62 15.334
16/03/2004 4428.90 15.964 28/05/2004 4430.69 -22.931
17/03/2004 4456.80 27.903 01/06/2004 4422.68 -8.009
18/03/2004 4397.87 -58.931 02/06/2004 4422.80 0.118
19/03/2004 4417.74 19.867 03/06/2004 4435.41 12.609
22/03/2004 4333.77 -83.966 04/06/2004 4454.45 19.042
23/03/2004 4318.51 -15.259 07/06/2004 4491.60 37.150
24/03/2004 4309.45 -9.065 08/06/2004 4504.83 13.227
25/03/2004 4373.63 64.188 09/06/2004 4489.47 -15.352
26/03/2004 4357.53 -16.104 10/06/2004 4486.10 -3.369
29/03/2004 4406.73 49.200 11/06/2004 4483.96 -2.150
30/03/2004 4412.82 6.094 14/06/2004 4433.17 -50.782
31/03/2004 4385.67 -27.150 15/06/2004 4458.61 25.441
01/04/2004 4410.71 25.040 16/06/2004 4491.13 32.515
02/04/2004 4465.61 54.894 17/06/2004 4493.29 2.162
05/04/2004 4480.70 15.090 18/06/2004 4505.81 12.515
06/04/2004 4472.82 -7.879 21/06/2004 4502.18 -3.629
07/04/2004 4468.69 -4.130 22/06/2004 4468.49 -33.682
08/04/2004 4489.67 20.983 23/06/2004 4486.73 18.232
13/04/2004 4515.78 26.108 24/06/2004 4503.19 16.460
14/04/2004 4485.42 -30.357 25/06/2004 4494.05 -9.136
15/04/2004 4505.50 20.078 28/06/2004 4518.68 24.631
16/04/2004 4537.28 31.775 29/06/2004 4512.41 -6.270
19/04/2004 4546.22 8.940 30/06/2004 4464.07 -48.343
20/04/2004 4569.02 22.803 01/07/2004 4424.72 -39.345
21/04/2004 4539.87 -29.152 02/07/2004 4407.40 -17.322

Once it has been established that the series is stationary, you need to identify which model (AR, MA or both) your data fits. This is done by examining plots of its Autocorrelation (ACF) and Partial Autocorrelation (PACF) Functions. These are measures of how related data values are to each other.

An ACF with large spikes at initial lags that decay to zero or a PACF with a large spike at the first and possible at the second lag indicates an autoregressive process An ACF with a large spike at the first and possibly at the second lag and a PACF with large spikes at initial lags that decay to zero indicates a moving average process. The ACF and the PACF both exhibiting large spikes that gradually die out indicates that both autoregressive and moving averages processes are present.

The figure below shows the Autocorrelation Function of first difference of our sample data set.

Autocorrelation

The figure below shows the Partial Autocorrelation Function of first difference of our sample data set.

Partial Autocorrelation

Estimation

ARIMA is usually shown in the format ARIMA(p, d, q), where p is the number of autoregressive terms, d is the number of differences and q is the number of moving average terms. This next step involves the estimation of these coefficients. In general parameters are selected and then the validation step is performed. If the model is found to be unacceptable, new parameters are tested. Each parameter is a number between 0 and 5, with the sum of all 3 not exceeding 10.

Our sample data set shows small spikes throughout the plot. The shortness of these spikes suggests a randomness about the data. This makes the identification of a model quite difficult. However, as both plots are very similar this would suggest that both AR and MA elements exist. Through trial and error, the model ARIMA(1, 1, 1) has been chosen.

Validation

The final step is to examine the residuals, which is the data left over, and should be just noise. Plots of the ACF and PACF of the residuals are examined for any large spikes. If any appear, then new parameters should be selected.

The figure belows shows the Autocorrelation Function of residuals for our data set.

ACF

The figure below shows the Partial Autocorrelation Function of residuals for our data set.

PACF

Once it is established that the remaining spikes are due to noise, the model can be used to forecast. The figure below is a time series plot for our data set which shows a 5 day forecast. The forecast values are displayed as red triangles and the upper and lower 95% confidence limits are displayed as blue triangles.

Forecast

Listed below is the printout from Minitab showing various key statistics together with the forecasted values.

ARIMA model for IndexValue

Estimates at each iteration

Iteration SSE Params
0 224259 0.100 0.100 0.100 0.100 -1.223
1 167665 0.038 -0.050 0.162 0.249 -1.404
2 158166 -0.112 0.029 0.020 0.385 -1.488
3 148504 -0.262 0.086 -0.123 0.508 -1.569
4 137685 -0.412 0.123 -0.267 0.632 -1.658
5 123858 -0.562 0.116 -0.406 0.766 -1.812
6 110027 -0.631 -0.034 -0.446 0.837 -2.092
7 105695 -0.586 -0.184 -0.378 0.846 -1.983
8 105283 -0.594 -0.229 -0.398 0.850 -1.898
9 105243 -0.569 -0.242 -0.373 0.851 -1.823
10 105239 -0.583 -0.246 -0.391 0.852 -1.832
11 105239 -0.569 -0.247 -0.374 0.852 -1.808
12 105238 -0.580 -0.248 -0.388 0.852 -1.823
13 105238 -0.570 -0.247 -0.376 0.852 1.809
14 105238 -0.579 -0.248 -0.386 0.852 -1.821
15 105238 -0.571 -0.248 -0.377 0.852 -1.810
16 105238 -0.578 -0.248 -0.385 0.852 -1.819
17 105238 -0.572 -0.248 -0.378 0.852 -1.811
18 105238 -0.577 -0.248 -0.384 0.852 -1.818
19 105238 -0.573 -0.248 -0.379 0.852 -1.812
20 105238 -0.576 -0.248 -0.384 0.852 -1.817
21 105238 -0.573 -0.248 -0.380 0.852 -1.813
22 105238 -0.576 -0.248 -0.383 0.852 -1.817
23 105238 -0.574 -0.248 -0.380 0.852 -1.814
24 105238 -0.575 -0.248 -0.382 0.852 -1.816
25 105238 -0.574 -0.248 -0.381 0.852 -1.814

Final Estimates of Parameters

Type Coef SE Coef T P
AR 1 -0.5744 0.3866 -1.49
SAR 13 -0.2477 0.1260 -1.97
MA 1 -0.3812 0.4293 -0.89
SMA 22 0.1109 7.69 0.000
Constant -1.8145 0.9366 -1.94 0.056

Differencing: 1 regular, 1 seasonal of order 13 Number of observations: Original series 100, after differencing 86 Residuals: SS = 93938.9 (backforecasts excluded) MS = 1159.7 DF = 81

Modified Box-Pierce (Ljung-Box) Chi-Square statistic:

Lag 12 24 36 48
Chi-Square 15.7 22.4 50.6 60.7
DF 7 19 31 43
P-Value 0.028 0.266 0.015 0.039

Forecasts from period 100. 95 Percent Limits

Period Forecast Lower Upper Actual
101 4383.90 4317.14 4450.66
102 4372.02 4286.23 4457.80
103 4361.06 4255.64 4466.47
104 4375.08 4255.23 4494.93
105 4379.39 4245.60 4513.17

POST A COMMENT

Markdown available. Required *.

*

*

*

*