Energy Price Forecasting

Time series public market auction price forecasting. Comparing different prediction horizon scenarios (hourly and daily), prediction models and contribution of feature engineering.

Results overview

Model	Art. Feat	Period [h]	Horizon [h]	MAE	RMSE
Prohpet	No	1	1	47.66	61.33
Prohpet	No	24	24	50.09	64.45
XGBoost	No	1	1	21.08	29.37
XGBoost	No	6	1	16.82	24.28
XGBoost	No	24	1	15.27	21.01
XGBoost	No	1	24	43.15	59.91
XGBoost	No	6	24	46.72	64.49
XGBoost	No	24	24	46.24	62.84
XGBoost	Yes	1	1	17.21	25.54
XGBoost	Yes	6	1	14.87	20.81
XGBoost	Yes	24	1	16.65	23.35
XGBoost	Yes	1	24	46.32	63.47
XGBoost	Yes	6	24	47.19	64.21
XGBoost	Yes	24	24	49.48	65.98
CatBoost	No	1	1	20.88	29.26
CatBoost	No	6	1	15.84	23.37
CatBoost	No	24	1	14.58	20.72
CatBoost	No	6	24	43.43	61.04
CatBoost	No	24	24	44.10	59.13
CatBoost	No	1	24	42.30	58.95
CatBoost	Yes	1	1	16.12	23.80
CatBoost	Yes	6	1
CatBoost	Yes	24	1	15.08	21.46
CatBoost	Yes	1	24	43.16	60.62
CatBoost	Yes	6	24	44.40	59.07
CatBoost	Yes	24	24

Quick start

git clone ...
cd price_forecasting
poetry install
poetry shell
(venv) python prophet_forecasting.py                            # evaluates Prophet model
(venv) python tree_forecasting.py -m catboost -l 24 -z 24       # evaluates tree based models
(venv) python optimization.py                                   # evaluates optimization strategy

Dataset

Hourly electicity prices from public european energy market over a 6 month period time window.

Data sources:

https://www.smard.de (Bundesnetzagentur)
https://transparency.entsoe.eu

Download as .csv file in various time resolutions.

EDA

Table annotations

Available dataset columns names.

MTU (CET/CEST) $\rightarrow$ Time intervals [FROM, TO] in UTC+1 timezone
Day-ahead Price [EUR/MWh] $\rightarrow$ Target price column to be predicted
Currency $\rightarrow$ Price unit Euro (irrelevant for modeling)
BZN|DE-LU $\rightarrow$ Bidding zone Germany/Luxembourg

Market price raw data

Electricity price of the public auction market.

Y Price [€/MWh] & ds [hour]

Observations

Noisy short term variability, next to long term repetitive cyclic patterns
External factors seems to cause unusual large outlier in early March

Time series statistics

ds: Time | y: Price

Preprocessing daily sample distribution

(Log) Price samples per day historgram

Gap localization over time. Samples count per day

Data cleansing by interpolation over small time step gaps.

Feature engineering

Two situations with regards to the feature dimensions have been compared for the XGBoost. Raw time series only as feature input, in constrast to extracting additional common metrics from the price.

Features over time

Forecast model (1. Prophet)

POC parametrization

Initial window size in days (120) $\rightarrow$ training window (the bigger the better)
Horizon in hours $\rightarrow$ prediction step size (how far to predict into the future)
Period in hours $\rightarrow$ number of prediction steps Y [Price in €/MWh] over time

Legend:

Historical observations (black dots)
Confidence interval (light blue band)
Predictions (blue dense line)
Upper threshold limit (dashed black line)

Test set evaluation

Time series cross validation is used to measure the forecast error using historical data. This is done by selecting cutoff points in the history, and for each of them fitting the model using data only up to that cutoff point. The forecasted values (yhat) are compared to the actual (y) values.

Prediction & observations over time for daily forecast horizon [Price in €/MWh]

Metrics are below the pricings standard deviation of 90.656821, which means they are reasonable, but error metrics are still at quite high level. Hence the model did derive valuable information from the data, but it can be assumed that there is quite some potential left with dataset preprocessing and model selection. And most importantly the models parameters (e.g. sampling strategy) are just chosen for quick experimentation but not for optimal results and need more adjustment.

Review

Prophet model for time series forecasting is exhausting its capability to handle high frequency, volatile from external unkown factors in the data and struggles to model irregular patterns.

Forecast model (2. XGBoost)

Boosting (unbalanced) tree based regression model for time series forecasting.

Test set evaluation

Test set size of 696 June samples (deducting samples from gap between cross validation splits). Two scenarios of hourly and one day ahead forecast horizons.

Prediction & observations for daily forecast horizon, no artifical features [Price in €/MWh]

Prediction & observations for hourly forecast horizon, no artifical features [Price in €/MWh]

XGBoost with extended input feature dimensions Prediction & observations for daily forecast horizon, artifical features added [Price in €/MWh]

Prediction & observations for hourly forecast horizon, artifical features added [Price in €/MWh]

Review

The XGBoost model achieves significantly smaller error metrics on the evaluation sets for the short term prediction scenario and handles non saisonale spiky patterns better. While the tree based model outperforms the prohpet model on long term predictions on a smaller distance. Data set extension adding artifical feature dimensions, does only contribute benefitial to the prediction accuracy in the short horizon scenario.

Forecast model (3. CatBoost)

CatBoost is known to excel even without extensive hyperparameter optimization building symmetric trees.

Test set evaluation

Test set size of 696 June samples (deducting samples from gap between cross validation splits). Two scenarios of hourly and one day ahead forecast horizons.

Prediction & observations for daily forecast horizon, no artifical features [Price in €/MWh]

Prediction & observations for hourly forecast horizon, no artifical features [Price in €/MWh]

CatBoost with extended input feature dimensions Prediction & observations for daily forecast horizon, artifical features added [Price in €/MWh]

Prediction & observations for hourly forecast horizon, artifical features added [Price in €/MWh]

Review

Catboost outperforms XGBoost slightly and Prophet forecasting scenario. In contrast to XGboost it benefits from extended feature dimension even in small dataset challenges.

Overall it can be observed that the fitting effort for tree based models are multiple magnitudes smaller, than for non tree-based additive regression models like Prohpet. While both tree model approaches also achieved better results on the test set for energy price forecasting.

Optimization Strategy

Task: Buy energy at cheap prices and store in batteries, to sell at future higher prices.

Problem formalization

Charging speed at time t: $c_t$

Battery constraints

Total capacity: $0 <= SOC_t <= 1MWh$

Charging speed: $-1 <= c_t <= 1MWh$

Trading actions

$c_t > 0 → $ Charging/Buy

$c_t = 0 → $ Idle/Hold

$c_t < 0 → $ Discharging/Sell

State update $SOC_{t+1} = SOC_t + c_t$

Optimizable cost function $max ∑_t = price_t * c_t$

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
plots		plots
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
electricity_prices_60min.csv		electricity_prices_60min.csv
optimization.py		optimization.py
poetry.lock		poetry.lock
prophet_forecasting.py		prophet_forecasting.py
pyproject.toml		pyproject.toml
tree_forecasting.py		tree_forecasting.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Energy Price Forecasting

Results overview

Quick start

Dataset

EDA

Table annotations

Market price raw data

Time series statistics

Preprocessing daily sample distribution

Feature engineering

Forecast model (1. Prophet)

Test set evaluation

Review

Forecast model (2. XGBoost)

Test set evaluation

Review

Forecast model (3. CatBoost)

Test set evaluation

Review

Optimization Strategy

Problem formalization

About

Uh oh!

Languages

License

DominikGithub/price_forecasting

Folders and files

Latest commit

History

Repository files navigation

Energy Price Forecasting

Results overview

Quick start

Dataset

EDA

Table annotations

Market price raw data

Time series statistics

Preprocessing daily sample distribution

Feature engineering

Forecast model (1. Prophet)

Test set evaluation

Review

Forecast model (2. XGBoost)

Test set evaluation

Review

Forecast model (3. CatBoost)

Test set evaluation

Review

Optimization Strategy

Problem formalization

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Languages