Lesson 3 - Checking Linear Regression Fit

Checking Whether the Line Actually Fits

In the previous lessons you fit a linear regression model and learned how to read its coefficients. Fitting is the easy part: scikit-learn will happily draw a line through any cloud of points you hand it, whether or not a line is a sensible description of the data. The harder and more important question is the one this lesson answers: is the model any good?

Answering that question well is what separates a careful analyst from someone who just calls .fit() and trusts the output. You will learn to interrogate a model from three angles: the assumptions it quietly relies on, the size of its errors, and the fraction of variation it manages to explain.

By the end of this lesson, you will be able to:

Explain the assumptions linear regression makes about its errors
Compute residuals and read a residual plot to check those assumptions
Calculate and interpret mean squared error (MSE) and root mean squared error (RMSE)
Calculate and interpret the R-squared (coefficient of determination) score
Combine these diagnostics to decide whether linear regression fits your data

You should be comfortable with basic Python, pandas, and the scikit-learn fit/predict pattern from the earlier lessons. Let’s begin.

Why Errors Are the Thing to Inspect

A linear regression model assumes the outcome can be split into exactly two pieces: a part the predictors explain, and everything left over. Written out, the model for an observation looks like this:

Y = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \dots + \beta_p X_p + \epsilon

Here $\beta_0$ is the intercept, each $\beta_j$ is a coefficient, and $\epsilon$ is the error: the part of the outcome that the predictors genuinely cannot account for. The model is only trustworthy if that leftover error behaves the way the math assumes it does.

We never observe the true errors $\epsilon$ directly. What we can observe, after fitting, are the residuals, our best estimate of those errors. The residual for observation $i$ is simply the actual value minus the predicted value:

r_i = Y_i - \hat{Y}_i

where $\hat{Y}_i$ (read “y-hat”) is the model’s prediction for that row. If the residuals look the way well-behaved errors should look, we gain confidence that a straight-line model is appropriate. If they show a clear pattern, that is the data warning us that a line is the wrong shape.

Linear regression assumes two things about the errors, and therefore expects the same of the residuals:

Zero mean. On average the errors cancel out: $E[\epsilon] = 0$ . The model is not systematically too high or too low.
Constant variance. The spread of the errors stays the same no matter how large the prediction is: $\mathrm{Var}[\epsilon] = \sigma^2$ . This property has a name worth knowing: homoskedasticity. Its opposite, where the spread grows or shrinks with the prediction, is heteroskedasticity.

The rest of this lesson is mostly about checking these two assumptions and then quantifying how well the model does once they hold.

Loading the Data

You will diagnose a model that predicts a car’s price from five physical measurements, using the real automobiles dataset (the classic UCI automobile imports data). It has 159 rows, 26 columns, and no missing values, which keeps the focus squarely on model checking.

import pandas as pd

df = pd.read_csv("automobiles.csv")  # download: https://datatweets.com/datasets/automobiles.csv

print("Shape:", df.shape)
print(df["price"].describe()[["mean", "min", "max"]].round(0))
# Output:
# Shape: (159, 26)
# mean    11446.0
# min      5118.0
# max     35056.0
# Name: price, dtype: float64

Prices range from about $5,118 to $35,056, with a mean near $11,446. Keep that mean in mind: it is the yardstick you will measure every error against later.

A Data Dictionary

You will use five numeric predictors. Here is what each one means, plus the target.

Column	Type	Meaning
`price`	int	Target: retail price in US dollars
`engine_size`	int	Engine displacement in cc
`horsepower`	int	Engine power output
`curb_weight`	int	Weight of the car with no occupants, in pounds
`width`	float	Body width in inches
`highway_mpg`	int	Fuel efficiency on the highway (miles per gallon)

These columns were chosen because each plausibly drives price: bigger, heavier, more powerful cars cost more, while thriftier cars (high highway_mpg) tend to be cheaper.

Preparing Features, Splitting, and Scaling

The recipe here matches the multiple-regression model from the previous lesson exactly, so the diagnostics you compute apply to a model you already understand. You select the five features, hold out 25 percent of the rows as a test set, and standardize the features so every coefficient lives on the same scale.

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LinearRegression

features = ["engine_size", "horsepower", "curb_weight", "width", "highway_mpg"]
X = df[features]
y = df["price"]

# Hold out 25% for an honest test of fit
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.25, random_state=42
)

# Standardize: fit the scaler on TRAIN only, then apply to both
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

model = LinearRegression()
model.fit(X_train_scaled, y_train)

for name, coef in zip(features, model.coef_):
    print(f"  {name:12s} coef = {coef:8.1f}")
print(f"  intercept = {model.intercept_:.1f}")
# Output:
#   engine_size  coef =   1808.4
#   horsepower   coef =    336.5
#   curb_weight  coef =   1935.4
#   width        coef =   1892.0
#   highway_mpg  coef =     82.6
#   intercept = 11442.5

The golden rule of scaling

Fit the scaler on the training data only, then apply the same transform to the test set. If you fit on the full dataset, information about the test rows leaks into training, and every diagnostic you compute afterward becomes too optimistic. The model must never see the test set before evaluation.

Because the features are standardized, the coefficients are directly comparable: curb_weight and engine_size move price the most, highway_mpg the least. The intercept of 11442.5 is close to the average price, which makes sense for standardized inputs. That is the model under inspection. Now you check whether it deserves your trust.

Computing Residuals

The first diagnostic needs predictions, and predictions are one method call away. You ask the fitted model for its predicted prices, then subtract them from the actual prices to get residuals.

predictions = model.predict(X_test_scaled)
residuals = y_test - predictions

print("First five residuals:")
print(residuals.head().round(0).values)
# Output:
# First five residuals:
# [ -642.  1188.  -311.  2954.  -1377.]

Each residual is a dollar amount: how far above (positive) or below (negative) the true price the model landed for that car. A residual of 2954 means the model under-predicted that car’s price by almost $3,000; a residual of -1377 means it over-predicted by about $1,377.

Checking the Zero-Mean Assumption

The first assumption says the residuals should average out to (approximately) zero, meaning the model is not biased high or low. Checking it is one line.

import numpy as np

residual_mean = np.mean(residuals)
print(f"Mean of residuals: {residual_mean:.2f}")
# Output:
# Mean of residuals: 95.42

Compared to a mean price of about $11,446, a residual mean of roughly $95 is tiny: less than one percent of the typical price. The model is essentially unbiased on the test set.

Why the training residuals always average to zero

For any linear regression fit with an intercept, the mean of the residuals on the training data is mathematically forced to be exactly zero. That is a side effect of how the intercept is chosen, so checking it on the training set is redundant. On the test set, as here, the mean is not guaranteed to be zero, which makes it a genuine check that the model generalizes without bias.

Reading the Residual Plot

Zero mean is necessary but not sufficient. A model can be unbiased on average while still being badly wrong in a structured way, for example over-predicting cheap cars and under-predicting expensive ones in equal measure. To catch that, you need to look at how the residuals are distributed across the range of predictions.

The tool for this is a residual plot: a scatter plot with the model’s predicted values on the x-axis and the residuals on the y-axis. You are looking for one specific thing: a shapeless, horizontal band of points centered on zero, with roughly the same vertical spread everywhere.

import matplotlib.pyplot as plt

plt.scatter(predictions, residuals)
plt.axhline(0, color="black", linewidth=1)
plt.xlabel("Predicted price ($)")
plt.ylabel("Residual ($)")
plt.title("Residuals vs predicted price")
plt.show()

The figure below shows the result for our model.

Residual plot of residuals versus predicted price — A good model leaves residuals scattered randomly around zero.

The points scatter fairly evenly above and below the zero line across most of the price range, with no obvious funnel or curve. That is what a healthy residual plot looks like, and it tells you a straight-line model is a reasonable choice here.

What Violations Look Like

The reason this check matters is that an unhealthy residual plot has a recognizable shape. Train your eye on these three patterns:

GOOD (homoskedastic)      FUNNEL (heteroskedastic)     CURVE (wrong shape)
                                                        
 r |  . . . . . . .        r |        . . :             r |  .          .
   |. . . . . . . .          |     . . : :                |   .       .
 0 |-.-.-.-.-.-.-.-.        0 |--.-:-:-:-:-:--          0 |-----.-.-.-----
   | . . . . . . .           |  . : : :                  |  .         .
   |  . . . . . .            |    . . :                  | .           .
   +---------------          +---------------            +---------------
        Y-hat                     Y-hat                       Y-hat

Good. An even horizontal band. The spread is constant and there is no trend. The assumptions hold.
Funnel. The spread grows (or shrinks) as predictions get larger. This is heteroskedasticity: the constant-variance assumption is violated, and the model is less reliable for some price ranges than others.
Curve. The residuals bend, often arcing below zero in the middle and above at the ends. This signals that the true relationship is non-linear, so a straight line is the wrong shape. You might fix it by adding a squared term or transforming a feature.

This is a qualitative check. You are not computing a number; you are confirming that you do not see the two bad patterns. Our automobiles residual plot passes.

Use predicted values, not a single feature

With one predictor you could plot residuals against that predictor. With several predictors, plot residuals against the model’s predicted values instead. The prediction summarizes all the features at once, so a single residual-versus-prediction plot checks the whole model rather than one variable at a time.

Measuring Error Size: MSE and RMSE

The residual plot tells you whether the shape of the errors is acceptable. It does not tell you how big they are. For that you need a single summary number.

Recall that fitting chose coefficients to minimize the sum of squared errors. We can turn that same quantity into an average, the mean squared error (MSE):

\text{MSE} = \frac{1}{n} \sum_{i=1}^{n} (Y_i - \hat{Y}_i)^2

MSE is the average of the squared residuals. Squaring keeps every term positive and punishes large misses harder than small ones. The trouble is its units: because we squared dollars, MSE is in dollars squared, which is impossible to interpret directly. The fix is to take the square root, giving the root mean squared error (RMSE):

\text{RMSE} = \sqrt{\frac{1}{n} \sum_{i=1}^{n} (Y_i - \hat{Y}_i)^2}

RMSE is back in dollars, so you can read it as “the typical size of a prediction error.” scikit-learn computes both for you. Always evaluate on the test set, because a model is optimized to minimize error on the training set and will flatter itself there.

from sklearn.metrics import mean_squared_error, mean_absolute_error

test_mse = mean_squared_error(y_test, predictions)
test_rmse = np.sqrt(test_mse)
test_mae = mean_absolute_error(y_test, predictions)

print(f"Test MSE:  {test_mse:,.0f}")
print(f"Test RMSE: ${test_rmse:,.0f}")
print(f"Test MAE:  ${test_mae:,.0f}")
# Output:
# Test MSE:  5,417,000
# Test RMSE: $2,327
# Test MAE:  $1,863

An RMSE of about $2,327 means the model’s price predictions are typically off by a little over two thousand dollars. Against an average price of $11,446, that is roughly a 20 percent typical error: not perfect, but a usable estimate.

RMSE vs MAE

You also printed the mean absolute error (MAE), the average of the absolute residuals:

\text{MAE} = \frac{1}{n} \sum_{i=1}^{n} \left| Y_i - \hat{Y}_i \right|

MAE here is about $1,863, smaller than RMSE. That gap is informative. Because RMSE squares the residuals before averaging, it weights a few large misses heavily, so RMSE is always greater than or equal to MAE. When RMSE sits well above MAE, a handful of outliers are inflating the error; when they are close, the errors are fairly uniform in size. Our $2,327 versus $1,863 says there are a few cars the model misses badly, but no single catastrophe dominates.

Which error metric should you report?

Report RMSE when large errors are especially costly and you want them penalized; report MAE when you want a robust, outlier-resistant sense of the typical miss. They answer slightly different questions, and showing both, as we did, gives a fuller picture than either alone.

Measuring Explained Variance: R-Squared

RMSE tells you the size of the errors in dollars, but it does not tell you whether that size is good relative to how much prices vary in the first place. An RMSE of $2,327 would be excellent if prices ranged over $200,000 and mediocre if they barely moved. The coefficient of determination, written $R^2$ , captures exactly this idea: the proportion of the variation in the outcome that the model explains.

The reasoning starts from a neat fact. The total variance in the outcome splits into the variance the model captures plus the variance left in the errors:

\mathrm{Var}(Y) = \mathrm{Var}(\hat{Y}) + \mathrm{Var}(\epsilon)

From there, $R^2$ is defined as one minus the share of variance that remains unexplained:

R^2 = 1 - \frac{\mathrm{Var}(\epsilon)}{\mathrm{Var}(Y)} = 1 - \frac{\sum_i (Y_i - \hat{Y}_i)^2}{\sum_i (Y_i - \bar{Y})^2}

Read it this way:

$R^2 = 1$ : the model explains everything; residuals are all zero.
$R^2 = 0$ : the model explains nothing; it does no better than always predicting the mean $\bar{Y}$ .
$R^2 < 0$ : the model is worse than predicting the mean (yes, this can happen on a test set).

scikit-learn provides r2_score, and the fitted model’s .score() method returns the same value.

from sklearn.metrics import r2_score

test_r2 = r2_score(y_test, predictions)
print(f"Test R^2: {test_r2:.3f}")
print(f"Same via .score(): {model.score(X_test_scaled, y_test):.3f}")
# Output:
# Test R^2: 0.793
# Same via .score(): 0.793

An $R^2$ of 0.793 means the five features together explain about 79 percent of the variation in car prices on data the model has never seen. The remaining 21 percent is driven by things not in our five columns: brand prestige, options, condition, and plain market noise. For five physical measurements, explaining four-fifths of price is a genuinely strong fit.

A high R-squared is not the whole story

$R^2$ measures explained variance, not correctness of the model’s shape. A curved relationship can still post a respectable $R^2$ while the residual plot screams that a line is wrong. Always pair $R^2$ with the residual plot: the number tells you how much variation is captured, the plot tells you whether you captured it the right way.

Seeing the Fit Directly

One last view ties the diagnostics together. If you plot each car’s predicted price against its actual price, perfect predictions would land exactly on the diagonal line $\hat{Y} = Y$ . The tighter the points hug that diagonal, the higher the $R^2$ .

Scatter plot of predicted price versus actual price with the ideal diagonal line — Predicted versus actual price; points near the diagonal mean accurate predictions.

The points cluster close to the diagonal across the whole price range, which is the visual signature of an $R^2$ near 0.8. A couple of expensive cars sit a bit off the line, the same outliers that pushed RMSE above MAE earlier. Every diagnostic in this lesson is telling a consistent story.

For completeness, here are the standardized coefficients of the very model you just validated, so you can connect “is it a good fit?” back to “what is it actually using?”

Bar chart of standardized regression coefficients for the five features — Standardized coefficients: curb weight and engine size move price the most.

Putting the Diagnostics Together

Here is the full evaluation workflow condensed into one runnable block. This is the template you will reuse to check any regression model.

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score

# 1. Load and select features
df = pd.read_csv("automobiles.csv")  # download: https://datatweets.com/datasets/automobiles.csv
features = ["engine_size", "horsepower", "curb_weight", "width", "highway_mpg"]
X, y = df[features], df["price"]

# 2. Split, then scale (fit on train only)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.25, random_state=42
)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# 3. Fit
model = LinearRegression().fit(X_train, y_train)

# 4. Diagnose on the TEST set
preds = model.predict(X_test)
residuals = y_test - preds

print(f"Residual mean: ${np.mean(residuals):,.0f}")   # zero-mean check
print(f"RMSE:          ${np.sqrt(mean_squared_error(y_test, preds)):,.0f}")
print(f"MAE:           ${mean_absolute_error(y_test, preds):,.0f}")
print(f"R^2:           {r2_score(y_test, preds):.3f}")
# Output:
# Residual mean: $95
# RMSE:          $2,327
# MAE:           $1,863
# R^2:           0.793

In a dozen lines you loaded real data, fit a model, and ran every check that matters: the residual plot for shape, RMSE and MAE for error size, and $R^2$ for explained variance. A model that passes all four is one you can defend.

Practice Exercises

Try these before checking the hints.

Exercise 1: Plot Residuals for a Weaker Model

Fit a linear regression using only highway_mpg as the predictor of price (still standardized, same split), then build a residual plot. Compare its shape to the five-feature model’s plot. Does it look as clean?

import pandas as pd
df = pd.read_csv("automobiles.csv")

# Your code here

Hint

Set X = df[["highway_mpg"]], run the same split, scale, and fit. Compute residuals = y_test - model.predict(X_test_scaled) and call plt.scatter(model.predict(X_test_scaled), residuals). With a single weak predictor, expect a wider, less even band of residuals: the visual sign of a poorer fit.

Exercise 2: Compute RMSE and R-squared Yourself

Without using mean_squared_error or r2_score, compute the test RMSE and $R^2$ for the five-feature model using only NumPy, and confirm they match $2,327 and 0.793.

# Reuse predictions and y_test from the lesson
# Your code here

Hint

RMSE is np.sqrt(np.mean((y_test - predictions) ** 2)). For $R^2$ , compute the residual sum of squares ss_res = np.sum((y_test - predictions) ** 2) and the total sum of squares ss_tot = np.sum((y_test - y_test.mean()) ** 2), then r2 = 1 - ss_res / ss_tot.

Exercise 3: Compare Train and Test R-squared

Compute $R^2$ on both the training set and the test set for the five-feature model. Is the training score higher? Explain in one sentence why that is expected.

# Your code here

Hint

Call model.score(X_train_scaled, y_train) and model.score(X_test_scaled, y_test). The training $R^2$ is usually a little higher because the model was optimized on exactly that data, which is why you must always report the test score as your honest measure of fit.

Summary

You learned how to judge a linear regression model rather than just fit one. Let’s review.

Key Concepts

Residuals and Assumptions

A residual is the actual value minus the predicted value, $r_i = Y_i - \hat{Y}_i$ , the model’s stand-in for the unobservable error
Linear regression assumes the errors have zero mean (no systematic bias) and constant variance (homoskedasticity)
A residual plot of residuals versus predictions should show a shapeless horizontal band centered on zero; a funnel signals heteroskedasticity and a curve signals a non-linear relationship

Error Metrics

MSE averages the squared residuals; RMSE takes its square root to return to the original units (dollars here)
MAE averages the absolute residuals and is more robust to outliers; RMSE above MAE means a few large misses
Our model: RMSE about $2,327 and MAE about $1,863 against a mean price of $11,446

R-Squared

$R^2$ is the proportion of outcome variance the model explains, from 1 (perfect) down through 0 (no better than the mean) and even negative on a test set
Our model reaches $R^2 = 0.793$ , explaining about 79 percent of price variation on unseen data
A high $R^2$ does not prove the model’s shape is right; always read it alongside the residual plot

Discipline

Compute every diagnostic on the test set, and fit the scaler on the training set only, to keep the evaluation honest

Why This Matters

Every regression project ends with the same question from a stakeholder: “Can we trust this model?” The diagnostics in this lesson are how you answer responsibly. The residual plot tells you whether a line is even the right shape, RMSE and MAE tell you how large the mistakes are in real-world units, and $R^2$ tells you how much of the story the model captures. Together they let you move past “the code ran” to “the model is sound,” which is the judgment professional data work is built on.

Next Steps

You can now diagnose a fitted model from every important angle. In the next lesson you will put a model to work end to end, training on real data and generating predictions on a held-out test set.

Continue to Lesson 4 - Applying Linear Regression Models

Train on real data and make predictions on a held-out test set.

Back to Module Overview

Return to the Regression module overview.

Keep Sharpening Your Judgment

Fitting a model is a single line of code; deciding whether to believe it is a skill. The habit you built here, of pairing a visual check with quantitative metrics and always measuring on unseen data, transfers to every model you will ever train, linear or not. Make these four checks reflexive, and you will catch broken models long before they reach a decision-maker’s desk.

Lesson 2 - Interpreting Regression Parameters

Lesson 4 - Applying Linear Regression Models

Courses

DATATWEETS

Title here

Lesson 3 - Checking Linear Regression Fit

Checking Whether the Line Actually Fits

Why Errors Are the Thing to Inspect

Loading the Data

A Data Dictionary

Preparing Features, Splitting, and Scaling

Computing Residuals

Checking the Zero-Mean Assumption

Reading the Residual Plot

What Violations Look Like

Measuring Error Size: MSE and RMSE

RMSE vs MAE

Measuring Explained Variance: R-Squared

Seeing the Fit Directly

Putting the Diagnostics Together

Practice Exercises

Exercise 1: Plot Residuals for a Weaker Model

Exercise 2: Compute RMSE and R-squared Yourself

Exercise 3: Compare Train and Test R-squared

Summary

Key Concepts

Why This Matters

Next Steps

Continue to Lesson 4 - Applying Linear Regression Models

Back to Module Overview

Keep Sharpening Your Judgment

Lesson 3 - Checking Linear Regression Fit

Checking Whether the Line Actually Fits#

Why Errors Are the Thing to Inspect#

Loading the Data#

A Data Dictionary#

Preparing Features, Splitting, and Scaling#

Computing Residuals#

Checking the Zero-Mean Assumption#

Reading the Residual Plot#

What Violations Look Like#

Measuring Error Size: MSE and RMSE#

RMSE vs MAE#

Measuring Explained Variance: R-Squared#

Seeing the Fit Directly#

Putting the Diagnostics Together#

Practice Exercises#

Exercise 1: Plot Residuals for a Weaker Model#

Exercise 2: Compute RMSE and R-squared Yourself#

Exercise 3: Compare Train and Test R-squared#

Summary#

Key Concepts#

Why This Matters#

Next Steps#

Continue to Lesson 4 - Applying Linear Regression Models

Back to Module Overview

Keep Sharpening Your Judgment#

Checking Whether the Line Actually Fits

Why Errors Are the Thing to Inspect

Loading the Data

A Data Dictionary

Preparing Features, Splitting, and Scaling

Computing Residuals

Checking the Zero-Mean Assumption

Reading the Residual Plot

What Violations Look Like

Measuring Error Size: MSE and RMSE

RMSE vs MAE

Measuring Explained Variance: R-Squared

Seeing the Fit Directly

Putting the Diagnostics Together

Practice Exercises

Exercise 1: Plot Residuals for a Weaker Model

Exercise 2: Compute RMSE and R-squared Yourself

Exercise 3: Compare Train and Test R-squared

Summary

Key Concepts

Why This Matters

Next Steps

Keep Sharpening Your Judgment