The library that dominates tabular machine learning: XGBoost's DMatrix and APIs, its regularized objective and the math behind how it scores a split, and every hyperparameter that controls it.
Welcome to XGBoost in Depth, the second module of the course. With a from-scratch booster behind you, you’re ready for the library that made gradient boosting famous. XGBoost is fast, accurate, and remarkably robust, and this module takes it apart so nothing about it stays a mystery.
You’ll fit your first XGBoost models with both its scikit-learn API and its native API with the DMatrix, then look inside its regularized objective to see exactly how it uses gradients and hessians to compute each leaf’s weight and to score every candidate split. From there you’ll master the hyperparameters that control it: the learning rate, max_depth, min_child_weight, and gamma, and the regularization and sampling knobs — reg_lambda, reg_alpha, subsample, and colsample_bytree. The module ends with a guided project where you tune XGBoost on real data and measure the gain.
Every model here is trained for real with xgboost on the California Housing dataset. Start with Lesson 1, where you’ll fit your first XGBoost model and see why it is built differently.
Complete all 5 lessons to finish the XGBoost in Depth module.