Boosting Foundations on DATATWEETS

Boosting Foundations on DATATWEETShttps://datatweets.com/courses/gradient-boosting/boosting-foundations/Recent content in Boosting Foundations on DATATWEETSHugoenCopyright (c) 2025 DatatweetsSun, 05 Jul 2026 09:00:00 +0200Lesson 1 - From Trees to Boostinghttps://datatweets.com/courses/gradient-boosting/boosting-foundations/lesson-1-from-trees-to-boosting/Sun, 05 Jul 2026 09:00:00 +0200https://datatweets.com/courses/gradient-boosting/boosting-foundations/lesson-1-from-trees-to-boosting/Bridge from decision trees and random forests to the idea of boosting. You will see why one tree has high variance, how bagging averages independent trees to reduce that variance, and how boosting instead trains trees one after another so each corrects the last tree’s mistakes. On the real California Housing dataset, you fit and compare a single decision tree (test RMSE 0.7069, R2 0.6187), a random forest (RMSE 0.5057, R2 0.8049), and a gradient boosting model (RMSE 0.5422, R2 0.7756), then watch boosting’s test error fall stage by stage from 1.0872 down to 0.4984.Lesson 2 - How Gradient Boosting Workshttps://datatweets.com/courses/gradient-boosting/boosting-foundations/lesson-2-how-gradient-boosting-works/Sun, 05 Jul 2026 09:00:00 +0200https://datatweets.com/courses/gradient-boosting/boosting-foundations/lesson-2-how-gradient-boosting-works/See exactly how gradient boosting builds a prediction as a sum of trees. Starting from the mean of MedHouseVal on the real California Housing dataset, you will fit shallow regression trees to the residuals in a loop, watch the training RMSE fall from 1.156 toward 0.556, learn why a small learning rate plus more trees generalizes better, and confirm your hand-built booster matches scikit-learn’s GradientBoostingRegressor.Lesson 3 - Gradient Boosting for Classificationhttps://datatweets.com/courses/gradient-boosting/boosting-foundations/lesson-3-gradient-boosting-for-classification/Sun, 05 Jul 2026 09:00:00 +0200https://datatweets.com/courses/gradient-boosting/boosting-foundations/lesson-3-gradient-boosting-for-classification/Move gradient boosting from predicting numbers to predicting classes. You will see why a classifier works in log-odds space, how the sigmoid turns an accumulated log-odds into a probability, what log loss measures, and why the trees fit the pseudo-residual y minus p. Then you will train a GradientBoostingClassifier on the real Adult Income data, reaching about 84.7 percent test accuracy and 0.871 ROC AUC, and read the predicted probabilities for individual people.Lesson 4 - Loss Functions and Pseudo-Residualshttps://datatweets.com/courses/gradient-boosting/boosting-foundations/lesson-4-loss-functions-and-pseudo-residuals/Sun, 05 Jul 2026 09:00:00 +0200https://datatweets.com/courses/gradient-boosting/boosting-foundations/lesson-4-loss-functions-and-pseudo-residuals/Discover the idea at the heart of gradient boosting: every tree is fit not to a plain residual but to the pseudo-residual, the negative gradient of a chosen loss with respect to the current prediction. Using a small slice of the real California Housing data and a hand-made set of labels, you will compute and print the pseudo-residuals for squared error, absolute error, and binary log loss, and see how each recovers a familiar quantity and tunes what the model cares about.Lesson 5 - Guided Project: Build a Gradient Booster from Scratchhttps://datatweets.com/courses/gradient-boosting/boosting-foundations/lesson-5-guided-project-build-a-gradient-booster-from-scratch/Sun, 05 Jul 2026 09:00:00 +0200https://datatweets.com/courses/gradient-boosting/boosting-foundations/lesson-5-guided-project-build-a-gradient-booster-from-scratch/Bring Module 1 together by building a complete gradient boosting regressor from scratch. You will load the real California Housing dataset, establish a predict-the-mean baseline at 1.145 test RMSE, implement a ScratchGradientBoostingRegressor class that fits shallow trees to residuals, and drive test RMSE down to 0.511 (R-squared 0.80). You then validate it against scikit-learn’s GradientBoostingRegressor, which lands at 0.511 as well, and explore how a smaller learning rate with more trees generalizes better.