Course

Gradient Boosting & XGBoost

Master the algorithm that wins competitions and powers real tabular systems — build gradient boosting from scratch, learn XGBoost inside out, then tune, explain, and deploy a real model in Python.

At a glance

Level
Intermediate to Advanced
Lessons
21 lessons across 5 modules
What you build
A tuned, explained model on real data
Cost
100% free · no API key needed

What you'll build

You'll work alongside Northwind Analytics, a small data team, building gradient-boosted models on two real datasets: predicting house prices with California Housing and predicting who earns more than 50K with the Adult Income dataset. You'll start by building a gradient booster from scratch in NumPy to see exactly how it learns, then master XGBoost end to end — its regularized objective, its hyperparameters, robust training with early stopping and cross-validation, imbalanced and categorical data, feature importance, SHAP explanations, and Optuna tuning — before shipping a saved, deployable model. Every model is trained for real with xgboost, scikit-learn, shap, and optuna, so your numbers match the ones shown.

Course syllabus

Work through the modules at your own pace. Each lesson is a self-contained, hands-on read.

1 Boosting Foundations 5 lessons · 1 week
2 XGBoost in Depth 5 lessons · 1–2 weeks
3 Training Robust Models 5 lessons · 1–2 weeks
4 Interpretation, Tuning & Deployment 5 lessons · 1–2 weeks
5 Capstone 1 lessons · 3–4 hours

Before you start

You'll need comfortable Python with pandas and numpy, and a working understanding of the machine learning workflow — training and test splits, overfitting, and how a decision tree makes a prediction. Our Machine Learning Foundations and Trees & Ensembles modules are ideal preparation. This course picks up exactly where trees and random forests leave off, and takes boosting all the way to a production-ready model.

Set up your environment

You can complete this course on any machine with Python 3.10+. There's no API key and nothing to pay for — both datasets ship with scikit-learn or download once and cache locally.

  1. Install the packages the course uses:
pip install xgboost lightgbm scikit-learn shap optuna pandas numpy

Every dataset is loaded from scikit-learn with a fixed random seed on every split, so your model outputs will closely match the ones shown, and you can rerun any experiment end to end.

Package APIs shift over time. If an xgboost or shap signature has changed since these versions, the boosting concepts still apply — adjust the syntax to what you have installed.

Ready to win with gradient boosting?

Start with why boosting beats a single tree, and work through every module from a from-scratch booster to a tuned, explained, deployable XGBoost model.

Start the first lesson

Want this taught live to your team?

Mehdi runs tailored corporate workshops on this exact material — hands-on, in-person or remote.

Learn about corporate training →
Sponsor

Keep DATATWEETS free. Help fund practical data, AI, and engineering lessons for learners worldwide.

Buy Me a Coffee at ko-fi.com