One train/test split is not enough to trust a forecasting model. Learn walk-forward validation, TimeSeriesSplit, and honest forecast intervals, and watch backtesting reverse a conclusion from earlier in the course.
Welcome to Evaluation and Backtesting, the eighth module of the course. Every model built since Module 1 was scored the same way: split Cyclepath once, train on the first 84 months, test on the last 12, and report one number. That is a completely standard way to start, and this course has been careful about it throughout. But a single split is still just one sample of how a model performs, and this module asks the question that follows directly from that: what if you had held out a different year instead?
You will start by measuring exactly how much a model’s reported error depends on which single year gets held out, a real, uncomfortable amount of variation. You will then build walk-forward validation, testing a model at several different points in time instead of one, first by hand with an expanding and a rolling window, then formalized with scikit-learn’s TimeSeriesSplit. You will check whether a model’s forecast intervals are honest, using empirical coverage across many backtested forecasts rather than trusting a stated confidence level at face value. The guided project backtests every model this course has built, seasonal-naive, SARIMA, and Holt-Winters, across six separate origins, and the result changes which model looks best, directly overturning the single-split conclusion Modules 6 and 7 reported.
Every backtest, every fold, and every coverage check in this module is computed for real on the same seeded Cyclepath series used throughout the course, refitting each model from scratch at every origin. Start with Lesson 1 on why one split was never quite enough.
Complete all 5 lessons to finish the Evaluation and Backtesting module.