<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Evaluation and Backtesting on DATATWEETS</title><link>https://datatweets.com/courses/time-series-forecasting/evaluation-and-backtesting/</link><description>Recent content in Evaluation and Backtesting on DATATWEETS</description><generator>Hugo</generator><language>en</language><copyright>Copyright (c) 2025 Datatweets</copyright><lastBuildDate>Sun, 05 Jul 2026 09:00:00 +0200</lastBuildDate><atom:link href="https://datatweets.com/courses/time-series-forecasting/evaluation-and-backtesting/index.xml" rel="self" type="application/rss+xml"/><item><title>Lesson 1 - Why One Split Is Not Enough</title><link>https://datatweets.com/courses/time-series-forecasting/evaluation-and-backtesting/lesson-1-why-one-split-is-not-enough/</link><pubDate>Sun, 05 Jul 2026 09:00:00 +0200</pubDate><guid>https://datatweets.com/courses/time-series-forecasting/evaluation-and-backtesting/lesson-1-why-one-split-is-not-enough/</guid><description>Since Module 1, every model in this course was judged on one twelve-month test period, 2023. Holding out 2021 or 2022 instead, using the exact same seasonal-naive baseline, gives a MAPE of 6.61% or 7.39%, not 5.9%, a real, meaningful swing of about 25%. A single split reports one sample from a range of possible outcomes, and this lesson measures that range directly, on the same series this course has used throughout, before building the tools to test more thoroughly.</description></item><item><title>Lesson 2 - Walk-Forward Validation</title><link>https://datatweets.com/courses/time-series-forecasting/evaluation-and-backtesting/lesson-2-walk-forward-validation/</link><pubDate>Sun, 05 Jul 2026 09:00:00 +0200</pubDate><guid>https://datatweets.com/courses/time-series-forecasting/evaluation-and-backtesting/lesson-2-walk-forward-validation/</guid><description>Walk-forward validation refits a model repeatedly, at a sequence of origins moving forward through the series, instead of once. An expanding window keeps every training point from the start; a rolling window keeps only the most recent fixed number of months. Built by hand on Cyclepath with six origins and a six-month horizon, SARIMA&amp;rsquo;s expanding-window MAPE averages 2.14% with a standard deviation of 1.11, ranging from 0.82% to 3.74% depending on the origin. A rolling window of the same size scores slightly worse (mean 2.24%), evidence that Cyclepath&amp;rsquo;s stable, non-drifting structure rewards keeping all the historical data rather than discarding the oldest months.</description></item><item><title>Lesson 3 - TimeSeriesSplit and Formalizing the Loop</title><link>https://datatweets.com/courses/time-series-forecasting/evaluation-and-backtesting/lesson-3-timeseriessplit-and-formalizing-the-loop/</link><pubDate>Sun, 05 Jul 2026 09:00:00 +0200</pubDate><guid>https://datatweets.com/courses/time-series-forecasting/evaluation-and-backtesting/lesson-3-timeseriessplit-and-formalizing-the-loop/</guid><description>scikit-learn&amp;rsquo;s TimeSeriesSplit(n_splits=6, test_size=6) reproduces, fold for fold, the exact same six expanding-window origins built by hand in Lesson 2, train sizes 60 through 90 months each followed by a 6-month test block. Using it to backtest the seasonal-naive baseline confirms Lesson 1&amp;rsquo;s warning with a proper multi-origin summary: a mean MAPE of 6.64% with a standard deviation of 0.87, a far more honest description than any single year&amp;rsquo;s 5.9%, 6.61%, or 7.39%.</description></item><item><title>Lesson 4 - Forecast Intervals and Their Honesty</title><link>https://datatweets.com/courses/time-series-forecasting/evaluation-and-backtesting/lesson-4-forecast-intervals-and-their-honesty/</link><pubDate>Sun, 05 Jul 2026 09:00:00 +0200</pubDate><guid>https://datatweets.com/courses/time-series-forecasting/evaluation-and-backtesting/lesson-4-forecast-intervals-and-their-honesty/</guid><description>A stated 95% forecast interval is a testable claim: across many forecasts, the actual value should fall inside it about 95% of the time, its empirical coverage. Backtested across all 36 forecasts from Module 8&amp;rsquo;s six origins, SARIMA&amp;rsquo;s 95% interval achieves 100% coverage at an average width of about 2,145 trips. A non-seasonal ARIMA also achieves 100% coverage, but only by being nearly five times wider (10,732), uninformative rather than wrong. A plain random-walk-with-drift model achieves neither: only 83.3% coverage, missing the actual value on 6 of 36 forecasts, despite an interval nearly four times wider than SARIMA&amp;rsquo;s. Good intervals need both correct coverage and narrow width, and only backtesting across many forecasts can check either one.</description></item><item><title>Lesson 5 - Guided Project: Backtesting Every Model From the Course</title><link>https://datatweets.com/courses/time-series-forecasting/evaluation-and-backtesting/lesson-5-guided-project-backtesting-every-model/</link><pubDate>Sun, 05 Jul 2026 09:00:00 +0200</pubDate><guid>https://datatweets.com/courses/time-series-forecasting/evaluation-and-backtesting/lesson-5-guided-project-backtesting-every-model/</guid><description>The Module 8 capstone backtests every forecasting model this course built across six origins with a six-month horizon. Seasonal-naive averages 6.64% MAPE (std 0.87), SARIMA averages 2.14% (std 1.11), and Holt-Winters averages 1.59% (std 0.53). Modules 6 and 7&amp;rsquo;s single-split test had ranked SARIMA (1.06%) ahead of Holt-Winters (1.57%). Properly backtested across six different origins, Holt-Winters is actually the more accurate model on average, and it is more than twice as stable from origin to origin, directly overturning the single-split conclusion this course reported.</description></item></channel></rss>