Augmented Dickey-Fuller Test#

Introduction#

The Augmented Dickey-Fuller (ADF) test is a statistical test used to determine whether a unit root is present in a time series dataset. A unit root indicates that a time series is non-stationary, meaning it exhibits a trend or some form of dependence between observations. The presence of a unit root implies that the series may not revert to its mean over time, making it difficult to predict future values.

The ADF test is commonly used in econometrics and time series analysis to assess the stationarity of a time series dataset, which is a crucial assumption for many modeling techniques and forecasting methods.

The null hypothesis of the ADF test is that a unit root is present in the time series, indicating non-stationarity. The alternative hypothesis is that the time series is stationary, meaning it does not contain a unit root.

The ADF test statistic is compared to critical values from a specific distribution (usually the Dickey-Fuller distribution) to determine whether to reject the null hypothesis. If the test statistic is less than the critical value, the null hypothesis is rejected, and the time series is considered stationary. If the test statistic is greater than the critical value, the null hypothesis cannot be rejected, suggesting that the time series is non-stationary.

The ADF test can be performed using statistical software packages such as Python’s statsmodels library or R’s tseries package. It is often used as a preliminary step in time series analysis to ensure that the data satisfies the stationarity assumption before proceeding with further analysis or modeling.

Example#

An example of performing the Augmented Dickey-Fuller (ADF) test in Python, including plotting the time series data, the ADF test results, and critical values:

[1]:

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from statsmodels.tsa.stattools import adfuller

# Generate example time series data
np.random.seed(123)
data = np.cumsum(np.random.randn(100))

# Plot the time series data
plt.figure(figsize=(10, 4))
plt.plot(data)
plt.title('Example Time Series Data')
plt.xlabel('Time')
plt.ylabel('Value')
plt.grid(True)
plt.show()

# Perform Augmented Dickey-Fuller test
result = adfuller(data)
print('ADF Statistic:', result[0])
print('p-value:', result[1])
print('Critical Values:')
for key, value in result[4].items():
    print('\t%s: %.3f' % (key, value))

# Plot ADF test results and critical values
plt.figure(figsize=(10, 6))
plt.plot(result[0], label='ADF Statistic', marker='o', color='blue')
plt.axhline(y=result[4]['1%'], color='red', linestyle='--', label='1% Critical Value')
plt.axhline(y=result[4]['5%'], color='orange', linestyle='--', label='5% Critical Value')
plt.axhline(y=result[4]['10%'], color='green', linestyle='--', label='10% Critical Value')
plt.title('ADF Test Results')
plt.xlabel('Test Statistic')
plt.ylabel('Critical Value')
plt.legend()
plt.grid(True)
plt.show()

../../_images/timeseries_adfuller_test_4_0.png

ADF Statistic: -1.629361682107354
p-value: 0.46782485206235236
Critical Values:
        1%: -3.498
        5%: -2.891
        10%: -2.583

../../_images/timeseries_adfuller_test_4_2.png

In this code: - We first generate example time series data using a cumulative sum of random Gaussian noise. - We plot the time series data using matplotlib. - We perform the Augmented Dickey-Fuller test using adfuller function from statsmodels.tsa.stattools. - We print out the ADF statistic, p-value, and critical values. - Finally, we plot the ADF test statistic and critical values to visualize the test results.

This example demonstrates how to use Python to perform the ADF test on time series data and interpret the results using plots and critical values.

Critical Values#

The critical values are pre-calculated values that are used to determine the significance of the Augmented Dickey-Fuller (ADF) test statistic. These critical values are specific to the ADF test and depend on the sample size of the data and the significance level chosen for the test (1%, 5%, or 10%).

The values (-3.498, -2.891, and -2.583) are the critical values for the ADF test statistic at the 1%, 5%, and 10% significance levels, respectively. These values are based on theoretical distributions derived from the Dickey-Fuller distribution and are commonly used as thresholds for determining whether the null hypothesis of the ADF test should be rejected.

When interpreting the ADF test results, the test statistic is compared to these critical values. If the test statistic is less than the critical value, the null hypothesis (presence of a unit root) is rejected, indicating that the time series is stationary. Conversely, if the test statistic is greater than the critical value, the null hypothesis cannot be rejected, suggesting that the time series is non-stationary.

These critical values are widely available in statistical tables or can be calculated using statistical software packages. In Python, the adfuller function from statsmodels.tsa.stattools returns the critical values as part of its output, allowing you to easily interpret the ADF test results.

ADF statistic and p-value#

In the context of the Augmented Dickey-Fuller (ADF) test, the ADF statistic and p-value are key components used to assess the stationarity of a time series. Here’s what each of these values represents:

ADF Statistic: The ADF statistic is the test statistic computed by the ADF test. It measures the strength of evidence against the null hypothesis of a unit root (non-stationarity) in the time series data. If the ADF statistic is more negative (or less positive) than the critical values, it suggests stronger evidence against the presence of a unit root, indicating that the time series is likely stationary.
p-value: The p-value associated with the ADF statistic indicates the probability of observing the ADF statistic (or a more extreme value) under the null hypothesis of a unit root. In other words, it assesses the significance of the ADF statistic. A p-value below a chosen significance level (e.g., 0.05) suggests that the null hypothesis can be rejected, indicating that the time series is likely stationary. Conversely, a p-value above the significance level suggests that there is insufficient evidence to reject the null hypothesis, indicating that the time series may be non-stationary.

In the provided output:

ADF Statistic: 2.742016573457468
p-value: 1.0

The ADF statistic is positive, which suggests that the evidence against the presence of a unit root (non-stationarity) in the time series data is weak. Additionally, the p-value is 1.0, indicating that the observed ADF statistic is not statistically significant at conventional significance levels (e.g., 0.05). Therefore, we cannot reject the null hypothesis of a unit root, suggesting that the time series may be non-stationary.