XGBoost in Depth on DATATWEETS

XGBoost in Depth on DATATWEETShttps://datatweets.com/courses/gradient-boosting/xgboost-in-depth/Recent content in XGBoost in Depth on DATATWEETSHugoenCopyright (c) 2025 DatatweetsSun, 05 Jul 2026 09:00:00 +0200Lesson 1 - Introducing XGBoosthttps://datatweets.com/courses/gradient-boosting/xgboost-in-depth/lesson-1-introducing-xgboost/Sun, 05 Jul 2026 09:00:00 +0200https://datatweets.com/courses/gradient-boosting/xgboost-in-depth/lesson-1-introducing-xgboost/Take your first hands-on step with XGBoost after building a booster from scratch in Module 1. You will learn the four design choices that set XGBoost apart from plain gradient boosting (a regularized objective, second-order optimization, sparsity-aware splits, and speed engineering), then fit the same model two ways on the real California Housing data: the scikit-learn API (XGBRegressor, test RMSE 0.4696, R2 0.8317) and the native API (xgb.train on a DMatrix, identical test RMSE 0.4696). Both land well ahead of Module 1’s plain GradientBoostingRegressor (RMSE 0.5422, R2 0.7756) on the same split.Lesson 2 - Inside the XGBoost Objectivehttps://datatweets.com/courses/gradient-boosting/xgboost-in-depth/lesson-2-inside-the-xgboost-objective/Sun, 05 Jul 2026 09:00:00 +0200https://datatweets.com/courses/gradient-boosting/xgboost-in-depth/lesson-2-inside-the-xgboost-objective/Open the hood on what makes XGBoost different from the hand-built booster of Module 1: a regularized objective and a second-order Newton approximation. You will derive the gradient and hessian of squared-error loss, the closed-form optimal leaf weight w* = -G/(H + lambda), the leaf similarity score G^2/(H + lambda), and the split gain that XGBoost maximizes, and pruning by gamma. Then you will compute all of these by hand in numpy on a tiny six-row dataset and confirm the leaf values from a real xgboost one-tree fit land exactly on your hand-computed weights of 8.625 and 23.625, with a split gain of 76.42.Lesson 3 - Core Hyperparametershttps://datatweets.com/courses/gradient-boosting/xgboost-in-depth/lesson-3-core-hyperparameters/Sun, 05 Jul 2026 09:00:00 +0200https://datatweets.com/courses/gradient-boosting/xgboost-in-depth/lesson-3-core-hyperparameters/Learn the five hyperparameters that most control an XGBoost model and see each one’s real effect on the California Housing dataset. You will sweep n_estimators and watch test RMSE bottom out near 0.46, trade learning_rate against tree count, and watch the train-versus-test gap widen from 0.03 at max_depth=2 to 0.38 at max_depth=10. Then you will pull two complexity-limiting levers, min_child_weight and gamma, and see them shrink that overfitting gap by making splits more conservative, connecting each back to the gain and hessian formulas from Lesson 2.Lesson 4 - Regularization and Samplinghttps://datatweets.com/courses/gradient-boosting/xgboost-in-depth/lesson-4-regularization-and-sampling/Sun, 05 Jul 2026 09:00:00 +0200https://datatweets.com/courses/gradient-boosting/xgboost-in-depth/lesson-4-regularization-and-sampling/Meet the XGBoost hyperparameters that fight overfitting directly. Starting from a deliberately overfit model (train RMSE 0.1043, test 0.4523), you tune reg_lambda and reg_alpha to penalize large leaf weights, and subsample and colsample_bytree to train each tree on a random slice of the rows and features. You will run real sweeps on the California Housing data and watch reg_lambda=100 shrink the train-test gap from 0.348 down to 0.174 while nudging test RMSE to its best value of 0.4374.Lesson 5 - Guided Project: Tuning XGBoost on Real Datahttps://datatweets.com/courses/gradient-boosting/xgboost-in-depth/lesson-5-guided-project-tuning-xgboost-on-real-data/Sun, 05 Jul 2026 09:00:00 +0200https://datatweets.com/courses/gradient-boosting/xgboost-in-depth/lesson-5-guided-project-tuning-xgboost-on-real-data/Put Module 2 to work by tuning XGBoost systematically on the real California Housing dataset. You establish a baseline XGBRegressor at 0.4626 test RMSE (R-squared 0.8367), then tune the structural knobs from Lesson 3 (max_depth, min_child_weight, and a learning_rate / n_estimators pair) on a held-out validation set, add the regularization and sampling knobs from Lesson 4 (reg_lambda, subsample, colsample_bytree) to shrink the train-test gap, and lock in a final model at 0.4419 test RMSE (R-squared 0.8510). The gain is a real 4.5 percent lower error, earned with an honest, reproducible tuning workflow.