Module · 5 lessons

XGBoost in Depth

The library that dominates tabular machine learning: XGBoost's DMatrix and APIs, its regularized objective and the math behind how it scores a split, and every hyperparameter that controls it.

Start module Back to Gradient Boosting & XGBoost

At a glance

Level

Intermediate to Advanced

Lessons

5 lessons

Time to complete

1–2 weeks

Cost

Free forever · no sign-up

Welcome to XGBoost in Depth, the second module of the course. With a from-scratch booster behind you, you’re ready for the library that made gradient boosting famous. XGBoost is fast, accurate, and remarkably robust, and this module takes it apart so nothing about it stays a mystery.

You’ll fit your first XGBoost models with both its scikit-learn API and its native API with the DMatrix, then look inside its regularized objective to see exactly how it uses gradients and hessians to compute each leaf’s weight and to score every candidate split. From there you’ll master the hyperparameters that control it: the learning rate, max_depth, min_child_weight, and gamma, and the regularization and sampling knobs — reg_lambda, reg_alpha, subsample, and colsample_bytree. The module ends with a guided project where you tune XGBoost on real data and measure the gain.

Every model here is trained for real with xgboost on the California Housing dataset. Start with Lesson 1, where you’ll fit your first XGBoost model and see why it is built differently.

Lessons in this module

1 Introducing XGBoost Meet XGBoost, fit your first model on the real California Housing dataset with both its scikit-learn and native APIs, and see it beat plain gradient boosting on the same train/test split. 2 Inside the XGBoost Objective Derive XGBoost's regularized objective, its second-order Taylor approximation with gradients and hessians, the optimal leaf weight, the leaf similarity score, and the split gain, then verify every formula against a real one-tree XGBoost fit in numpy. 3 Core Hyperparameters Tune the five core XGBoost hyperparameters - n_estimators, learning_rate, max_depth, min_child_weight, and gamma - by running real train-versus-test RMSE sweeps on the California Housing dataset. 4 Regularization and Sampling Use XGBoost's reg_lambda, reg_alpha, subsample, and colsample_bytree to shrink leaf weights and randomize each tree, closing the train-test gap on the real California Housing dataset. 5 Guided Project: Tuning XGBoost on Real Data Tune an XGBoost regressor step by step on the real California Housing dataset, moving from a baseline to a measurably better, better-generalizing model.

Achievement

Complete all 5 lessons to finish the XGBoost in Depth module.

Start module

Courses

DATATWEETS

Title here

XGBoost in Depth

At a glance

Lessons in this module