Lesson 2 - Interpreting the Regression Parameters

Welcome to Interpreting Logistic Regression

In the previous lesson you saw how the sigmoid function turns a straight-line score into a probability, which is what lets logistic regression do classification. In this lesson you go one level deeper: you fit a real model and learn to read what it is telling you. A logistic regression is not a black box. Every coefficient has a precise, business-friendly meaning, once you know how to translate it.

By the end of this lesson, you will be able to:

  • Create and fit a LogisticRegression model with scikit-learn
  • Access the fitted intercept_ and coef_ attributes
  • Explain why logistic regression coefficients live on the log-odds scale
  • Convert a coefficient into an odds ratio by exponentiating it
  • Read the direction and magnitude of each feature’s effect on churn

You should be comfortable with the machine learning workflow from the foundations module and with the sigmoid idea from Lesson 1 of this module. Let’s begin.


The Problem: Who Will Churn?

Imagine you work at a telecom company. Every month some customers cancel their service, an event the business calls churn. Losing a customer is expensive, and winning them back is harder than keeping them in the first place. So the question that matters is: which customers are at risk of leaving, and why?

That second word, why, is the heart of this lesson. A model that only spits out “this customer will churn” is useful, but a model that also tells you what drives churn is far more valuable. It lets the business act: offer a longer contract, fix a pain point, target a retention campaign. Logistic regression is popular precisely because it answers both questions at once. It predicts, and it explains.

You will work with the real customer churn dataset, a classic record of telecom subscribers and whether each one canceled.

import pandas as pd

# download: https://datatweets.com/datasets/customer_churn.csv
df = pd.read_csv("customer_churn.csv")

print("Shape:", df.shape)
print(df["Churn"].value_counts().to_dict())
# Output:
# Shape: (7032, 12)
# {'No': 5163, 'Yes': 1869}

The dataset has 7,032 customers described by 11 features, plus the Churn column that records whether each one left. About 27 percent of customers churned.

churn_rate = (df["Churn"] == "Yes").mean()
print("churn rate:", round(churn_rate, 3))
# Output: churn rate: 0.266

The features are already model-ready

The version of the dataset you load here has already been prepared for modeling: the numeric columns are present as numbers, and the categorical columns (like Contract and InternetService) have been turned into 0/1 indicator columns through one-hot encoding. That is why you see names like Contract_Two year and InternetService_Fiber optic. You will learn how to do this encoding yourself in a later lesson; for now it lets you focus entirely on interpreting the fitted model.


Fitting a LogisticRegression Model

Logistic regression in scikit-learn follows the exact same three-step pattern you already know: prepare your features and target, instantiate the model, then call .fit(). The only new idea in this lesson comes after fitting, when you read the coefficients.

First, separate the features from the target and convert the text target into numbers.

from sklearn.model_selection import train_test_split

feature_cols = [
    "tenure", "MonthlyCharges", "TotalCharges", "SeniorCitizen",
    "Contract_One year", "Contract_Two year",
    "InternetService_Fiber optic", "InternetService_No",
    "Partner_Yes", "Dependents_Yes", "PaperlessBilling_Yes",
]

X = df[feature_cols]                     # features
y = (df["Churn"] == "Yes").astype(int)   # target: 1 = churned, 0 = stayed

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.25, random_state=42, stratify=y
)

Just as in the foundations module, the numeric features sit on very different scales: tenure runs from 0 to about 72 months, while TotalCharges reaches into the thousands of dollars. To make the coefficients comparable to each other, you standardize the features so each one has a mean of 0 and a standard deviation of 1.

from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)  # fit on TRAIN only
X_test_scaled = scaler.transform(X_test)        # apply same transform to test

Now build and train the model. The interface is identical to the KNN classifier you met earlier, just a different import.

from sklearn.linear_model import LogisticRegression

model = LogisticRegression(max_iter=1000, random_state=42)
model.fit(X_train_scaled, y_train)

print("Model trained!")
# Output: Model trained!

That is the whole training step. Behind that single .fit() call, scikit-learn searched for the set of coefficients that best separates churners from non-churners. The interesting work now is understanding the numbers it found.

Why standardize before interpreting?

You scaled the features so every coefficient is expressed in the same unit: “one standard deviation of this feature.” Without scaling, a coefficient on TotalCharges (measured in dollars) and one on tenure (measured in months) would not be directly comparable, because a one-unit change means something completely different for each. Standardizing puts them on equal footing, so a bigger coefficient genuinely means a bigger effect.


The Logit: Why Coefficients Are Not Probabilities

Here is the single most important idea in this lesson. In ordinary linear regression, a coefficient directly changes the predicted number: add one to the feature, add the coefficient to the output. Logistic regression looks almost identical, but there is a twist. The linear part does not act on the probability directly. It acts on the log-odds of the outcome.

Let p p be the probability that a customer churns. The odds of churn are the ratio of churning to not churning, p1p \frac{p}{1-p} , and the log-odds (also called the logit) are the natural logarithm of that ratio. Logistic regression assumes this log-odds is a straight-line function of the features:

log ⁣(p1p)=β0+β1x1+β2x2++βkxk \log\!\left(\frac{p}{1-p}\right) = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \cdots + \beta_k x_k

So each coefficient βj \beta_j tells you how much the log-odds of churn change when feature xj x_j increases by one unit (one standard deviation, since you scaled). The coefficients are perfectly linear, but on the log-odds scale, not on the probability scale.

This is why you cannot read a logistic coefficient as “this raises the churn probability by 0.3.” It does not. It raises the log-odds by its value, and log-odds are not intuitive on their own. Fortunately, there is a clean way to translate them into something a business person can act on.

It helps to keep three views of the same outcome straight in your head:

   probability            odds                  log-odds (logit)
   -----------            ----                  ----------------
   p                      p / (1 - p)           log( p / (1 - p) )
   between 0 and 1        between 0 and inf      between -inf and +inf
   "how likely"          "how many times more   "the linear score the
                          likely than not"        model actually fits"

The model fits the log-odds because that quantity can range freely from negative to positive infinity, which is exactly what a linear combination produces. You will interpret in terms of odds, because odds are the most intuitive of the three. The bridge between them is the exponential function, which you will use next.


Interpreting the Intercept

Start with the simplest parameter, the intercept β0 \beta_0 , stored in the intercept_ attribute.

print("intercept:", round(model.intercept_[0], 3))
# Output: intercept: -1.703

What does 1.703 -1.703 mean? Set every feature to zero in the logit equation and everything but the intercept disappears:

log ⁣(p1p)=β0=1.703 \log\!\left(\frac{p}{1-p}\right) = \beta_0 = -1.703

So the intercept is the log-odds of churn for a customer whose features are all zero. Because you standardized, “all features zero” means a customer sitting at the average of every numeric feature and at the baseline category of every indicator. Log-odds are hard to feel, so exponentiate to get the odds:

import numpy as np

baseline_odds = np.exp(model.intercept_[0])
print("baseline odds:", round(baseline_odds, 3))
# Output: baseline odds: 0.182

Odds of about 0.18 means this baseline customer is roughly five times more likely to stay than to churn (since 1/0.185.5 1 / 0.18 \approx 5.5 ). That makes sense: only about a quarter of customers churn overall, so the average customer is far more likely to stay. A negative intercept always corresponds to odds below 1, which means the event is less likely than not.

Read the sign first

Before doing any arithmetic, glance at the sign of a log-odds value. Positive log-odds mean the event is more likely than not (odds above 1). Negative log-odds mean it is less likely than not (odds below 1). Exactly zero log-odds means a coin flip, with odds of exactly 1. This quick check tells you the direction of every effect at a glance.


Interpreting a Coefficient: The Odds Ratio

Now the part that makes logistic regression so useful. Look at the coefficient for tenure, how long a customer has been with the company.

coefs = dict(zip(feature_cols, model.coef_[0]))
print("tenure coef:", round(coefs["tenure"], 3))
# Output: tenure coef: -1.337

The coefficient is 1.337 -1.337 . On the log-odds scale, a one-standard-deviation increase in tenure lowers the log-odds of churn by 1.337. That is correct but unintuitive. To make it speak plainly, exponentiate it to get the odds ratio.

The reason exponentiating works is worth seeing once. Compare the odds at two feature values that differ by one unit. Call the odds when the feature is at some value O0 O_0 , and the odds after a one-unit increase O1 O_1 . From the logit equation, the difference in log-odds is exactly the coefficient:

log(O1)log(O0)=βjO1O0=eβj \log(O_1) - \log(O_0) = \beta_j \quad\Longrightarrow\quad \frac{O_1}{O_0} = e^{\beta_j}

That ratio O1O0 \frac{O_1}{O_0} is the odds ratio: the factor by which the odds get multiplied when the feature goes up by one unit. Compute it for tenure:

tenure_or = np.exp(coefs["tenure"])
print("tenure odds ratio:", round(tenure_or, 3))
# Output: tenure odds ratio: 0.263

An odds ratio of 0.263 has a clean reading: each one-standard-deviation increase in tenure multiplies the odds of churn by about 0.26, cutting them to roughly a quarter. In plain English, the longer a customer has been with the company, the far less likely they are to leave. That matches intuition, and now you have it as a precise, defensible number.

The rules for reading an odds ratio are simple:

  • An odds ratio greater than 1 means the feature increases the odds of churn.
  • An odds ratio less than 1 means the feature decreases the odds of churn.
  • An odds ratio equal to 1 means the feature has no effect on the odds.

Now contrast tenure with a feature that pushes the other way, TotalCharges, the cumulative amount a customer has paid.

tc_coef = coefs["TotalCharges"]
tc_or = np.exp(tc_coef)
print("TotalCharges coef:", round(tc_coef, 3))
print("TotalCharges odds ratio:", round(tc_or, 3))
# Output:
# TotalCharges coef: 0.631
# TotalCharges odds ratio: 1.879

Here the coefficient is positive (+0.631 +0.631 ), so the odds ratio is above 1 (1.879 1.879 ). Each one-standard-deviation increase in total charges multiplies the odds of churn by about 1.88, nearly doubling them. The sign tells you the direction; the size tells you the strength.

Odds ratio, not probability ratio

An odds ratio of 1.88 does not mean the customer is 1.88 times more likely to churn. It means the odds are multiplied by 1.88. Because the odds-to-probability relationship is non-linear, the same odds ratio shifts the probability a lot when churn is near 50 percent and only a little when it is near 0 or 100 percent. Keep odds ratios in odds language, and use predict_proba when you specifically need a probability.


Reading All the Coefficients Together

Because you standardized the features, every coefficient is on the same scale, so you can line them up and compare magnitudes directly. Sorting them turns the model into a ranked story of what drives churn.

import numpy as np

# pair each feature with its coefficient and odds ratio, sorted
order = np.argsort(model.coef_[0])
for i in order:
    name = feature_cols[i]
    coef = model.coef_[0][i]
    print(f"{name:<28} coef={coef:+.3f}  odds_ratio={np.exp(coef):.3f}")
# Output:
# tenure                       coef=-1.337  odds_ratio=0.263
# Contract_Two year            coef=-0.695  odds_ratio=0.499
# InternetService_No           coef=-0.407  odds_ratio=0.665
# Contract_One year            coef=-0.329  odds_ratio=0.719
# Dependents_Yes               coef=-0.124  odds_ratio=0.884
# MonthlyCharges               coef=-0.113  odds_ratio=0.894
# Partner_Yes                  coef=+0.019  odds_ratio=1.019
# SeniorCitizen                coef=+0.118  odds_ratio=1.125
# PaperlessBilling_Yes         coef=+0.208  odds_ratio=1.231
# InternetService_Fiber optic  coef=+0.499  odds_ratio=1.647
# TotalCharges                 coef=+0.631  odds_ratio=1.879

A bar chart makes the pattern obvious at a glance. Features with negative coefficients (to the left of zero) protect against churn; features with positive coefficients (to the right) raise the risk, and the length of each bar shows how strong the effect is.

Horizontal bar chart of standardized logistic regression coefficients, sorted from most churn-reducing to most churn-increasing
Standardized log-odds coefficients: bars left of zero reduce churn odds, bars right of zero increase them.

Read top to bottom, this is a complete business briefing:

  • tenure is by far the strongest protective factor. Long-tenured customers are stable; the odds of churn shrink to about a quarter (odds ratio 0.263) per standard deviation of tenure.
  • Contract_Two year roughly halves the odds of churn (odds ratio 0.499) compared with the baseline month-to-month contract. Locking customers into longer contracts clearly keeps them.
  • InternetService_Fiber optic and TotalCharges are the biggest churn drivers, with odds ratios of 1.647 and 1.879. Fiber customers and high-total-spend customers leave more often, a strong signal for the retention team to investigate.
  • Features near zero, like Partner_Yes (odds ratio 1.019), barely move the odds at all. The model is telling you they carry little independent information about churn.

This single chart is why teams reach for logistic regression even when fancier models exist. It does not just predict; it hands you a prioritized list of levers.

Coefficients are ‘controlling for’ the others

In a model with many predictors, each coefficient describes the effect of that feature holding the others fixed. Researchers call this “controlling for” the other variables. So the TotalCharges odds ratio of 1.879 is the effect of total charges after accounting for tenure, contract type, and everything else in the model, not its effect in isolation.


From Odds Back to Probability

Odds and odds ratios are the natural language for interpreting the model, but sometimes you genuinely need a probability for a specific customer, for example to rank everyone by churn risk. The predict_proba method gives you that, applying the sigmoid you met in Lesson 1 to the log-odds.

probs = model.predict_proba(X_test_scaled)
print(probs[:3])
# Output:
# [[0.91 0.09]
#  [0.62 0.38]
#  [0.78 0.22]]

Each row has two numbers: the probability of staying (class 0) and the probability of churning (class 1), which always sum to 1. The first customer has only a 9 percent chance of churning; the second is much riskier at 38 percent.

Because tenure dominates the model, you can see the whole story by plotting predicted churn probability against tenure while holding the other features at their average. The curve is the sigmoid in action: high churn risk for brand-new customers, falling steeply as tenure grows, then leveling off for long-term loyalists.

Line chart showing predicted churn probability decreasing as customer tenure increases, following an S-shaped sigmoid curve
Predicted churn probability falls steeply as tenure grows, the negative tenure coefficient seen as a probability.

Notice that the relationship is not a straight line, even though the coefficient was. That is the whole point of the logit: the model is linear on the log-odds scale, which becomes an S-shaped curve once you convert back to probability. A constant change in log-odds moves the probability a lot in the middle of the curve and very little near the flat ends.


Practice Exercises

Now it is your turn. Try these before checking the hints. Assume the fitted model, feature_cols, and X_test_scaled from the lesson are available.

Exercise 1: Odds Ratio for a Two-Year Contract

Find the coefficient for Contract_Two year, convert it to an odds ratio, and write one sentence explaining what it means for a customer’s churn risk.

import numpy as np

# Your code here

Hint

Build a lookup with coefs = dict(zip(feature_cols, model.coef_[0])), then exponentiate: np.exp(coefs["Contract_Two year"]). You should get about 0.499, meaning a two-year contract roughly halves the odds of churn compared with the month-to-month baseline.

Exercise 2: Rank Features by Effect Size

A coefficient’s magnitude (distance from zero) measures how strongly it affects churn, regardless of direction. Print the features sorted by the absolute value of their coefficients, largest first.

import numpy as np

# Your code here

Hint

Use np.argsort(np.abs(model.coef_[0]))[::-1] to get indices ordered by absolute coefficient, then loop over them and print feature_cols[i] alongside the coefficient. tenure (coef -1.337) should come out on top, with Partner_Yes (coef +0.019) at the bottom.

Exercise 3: Highest-Risk Customers

Use predict_proba to get each test customer’s churn probability (the second column), and print how many test customers have a predicted churn probability above 0.5.

import numpy as np

# Your code here

Hint

The churn probability is column index 1: churn_probs = model.predict_proba(X_test_scaled)[:, 1]. Then count with (churn_probs > 0.5).sum(). This is a preview of the next lesson, where you will turn these probabilities into yes/no predictions with a threshold.


Summary

Excellent work! You fitted a real logistic regression model and, more importantly, learned to translate its parameters into plain business language. Let’s review what you learned.

Key Concepts

Fitting the Model

  • LogisticRegression uses the same scikit-learn pattern as every other model: instantiate, then .fit()
  • The fitted intercept lives in model.intercept_ and the feature coefficients in model.coef_
  • Standardizing features first makes the coefficients directly comparable to one another

The Log-Odds Scale

  • Logistic regression is linear on the log-odds (logit), not on the probability
  • Three views of the same outcome: probability p p , odds p1p \frac{p}{1-p} , and log-odds logp1p \log\frac{p}{1-p}
  • The intercept is the log-odds of the outcome when all features are zero

Odds Ratios

  • Exponentiating a coefficient gives an odds ratio: eβj e^{\beta_j}
  • Odds ratio above 1 increases the odds; below 1 decreases them; equal to 1 means no effect
  • An odds ratio multiplies the odds, it is not a probability ratio

Reading the Model

  • The sign of a coefficient gives the direction of the effect; its magnitude gives the strength
  • For churn, tenure and a two-year contract are the strongest protective factors, while TotalCharges and fiber internet drive churn up the most
  • predict_proba converts the log-odds back into a probability via the sigmoid, producing an S-shaped curve

Why This Matters

Most powerful models are hard to explain. Logistic regression is the rare exception that is both genuinely useful and fully interpretable, which is why it remains a workhorse in business, medicine, and credit scoring decades after fancier methods appeared. When a regulator asks “why did the model flag this customer,” or a manager asks “what should we actually change,” an odds ratio gives you a precise, defensible answer.

The skill you practiced here, moving fluently between coefficients, log-odds, odds ratios, and probabilities, is exactly what separates someone who can run .fit() from someone who can explain a model to a decision-maker. Every interpretation you wrote, from “tenure cuts churn odds to a quarter” to “fiber nearly doubles them,” came straight from numbers the model handed you. That is the quiet power of a well-understood linear model.


Next Steps

You can now fit a logistic regression and explain exactly what it has learned. But interpreting coefficients does not tell you whether the model’s predictions are any good. In the next lesson, you will measure that directly with a confusion matrix and the metrics built on it.

Continue to Lesson 3 - Evaluating Logistic Regression Models

Measure how well your classifier actually predicts using the confusion matrix, accuracy, precision, and recall.

Back to Module Overview

Return to the Classification module overview.


Keep Building Your Skills

You have learned to read a logistic regression the way a practitioner does, turning raw coefficients into a ranked story of what drives an outcome. Keep that habit as you go: whenever you fit a model, ask not only “how accurate is it” but “what is it telling me about the problem.” The odds ratio will follow you well beyond this course, into any field where decisions hinge on understanding why, not just what.