Module · 5 lessons

Evaluation and Backtesting

One train/test split is not enough to trust a forecasting model. Learn walk-forward validation, TimeSeriesSplit, and honest forecast intervals, and watch backtesting reverse a conclusion from earlier in the course.

Start module Back to Time Series Forecasting

At a glance

Level

Intermediate

Lessons

5 lessons

Time to complete

1 week

Cost

Free forever · no sign-up

Welcome to Evaluation and Backtesting, the eighth module of the course. Every model built since Module 1 was scored the same way: split Cyclepath once, train on the first 84 months, test on the last 12, and report one number. That is a completely standard way to start, and this course has been careful about it throughout. But a single split is still just one sample of how a model performs, and this module asks the question that follows directly from that: what if you had held out a different year instead?

You will start by measuring exactly how much a model’s reported error depends on which single year gets held out, a real, uncomfortable amount of variation. You will then build walk-forward validation, testing a model at several different points in time instead of one, first by hand with an expanding and a rolling window, then formalized with scikit-learn’s TimeSeriesSplit. You will check whether a model’s forecast intervals are honest, using empirical coverage across many backtested forecasts rather than trusting a stated confidence level at face value. The guided project backtests every model this course has built, seasonal-naive, SARIMA, and Holt-Winters, across six separate origins, and the result changes which model looks best, directly overturning the single-split conclusion Modules 6 and 7 reported.

Every backtest, every fold, and every coverage check in this module is computed for real on the same seeded Cyclepath series used throughout the course, refitting each model from scratch at every origin. Start with Lesson 1 on why one split was never quite enough.

Lessons in this module

1 Why One Split Is Not Enough Every model in this course was scored on one held-out year. Hold out a different year instead, and the seasonal-naive baseline's MAPE swings from 5.9% to as high as 7.4%, a real difference that a single split cannot reveal. 2 Walk-Forward Validation Walk-forward validation tests a model at several points in time instead of one, refitting as the training window moves forward. Built by hand on Cyclepath with an expanding and a rolling window, SARIMA's error across six origins averages 2.14%, with real variation from origin to origin. 3 TimeSeriesSplit and Formalizing the Loop scikit-learn's TimeSeriesSplit generates the exact same six origins built by hand in Lesson 2. Formalizing the walk-forward loop this way turns backtesting into a single, checkable line instead of hand-written index arithmetic. 4 Forecast Intervals and Their Honesty A 95% forecast interval should contain the actual value about 95% of the time. Backtested across 36 forecasts, SARIMA's interval hits 100% of the time at a width of about 2,145 trips, while a plain random-walk model hits only 83.3% despite a wider interval, an honest failure a single test set would never reveal. 5 Guided Project: Backtesting Every Model From the Course Backtest seasonal-naive, SARIMA, and Holt-Winters across the same six origins, and watch the single-split ranking from Modules 6 and 7 get overturned: Holt-Winters wins on both average accuracy and stability once properly backtested.

Achievement

Complete all 5 lessons to finish the Evaluation and Backtesting module.

Start module

Courses

DATATWEETS

Title here

Evaluation and Backtesting

At a glance

Lessons in this module