Capstone on DATATWEETS

Capstone on DATATWEETShttps://datatweets.com/courses/gradient-boosting/capstone/Recent content in Capstone on DATATWEETSHugoenCopyright (c) 2025 DatatweetsSun, 05 Jul 2026 09:00:00 +0200Lesson 1 - Guided Project: An Honest, Tuned, Explained Modelhttps://datatweets.com/courses/gradient-boosting/capstone/lesson-1-guided-project-an-honest-tuned-explained-model/Sun, 05 Jul 2026 09:00:00 +0200https://datatweets.com/courses/gradient-boosting/capstone/lesson-1-guided-project-an-honest-tuned-explained-model/The course capstone takes one real dataset, Adult Income (48,842 rows, ~24 percent earn >50K), from raw data to a trustworthy model using skills from all four modules. A naive XGBClassifier baseline scores 0.8650 accuracy and 0.9183 ROC AUC but a weak 0.6506 positive-class recall. Adding scale_pos_weight and early stopping, then a 25-trial Optuna study, and finally SHAP explanations produces a model that lifts recall to 0.8563 and ROC AUC to 0.9299 while average precision climbs from 0.8100 to 0.8336. SHAP ranks marital-status, age, and capital-gain as the top drivers and decomposes a single high-earner prediction exactly in log-odds space.