Lesson 6 - Matrix Algebra

Welcome to Matrix Algebra

In the last lesson you worked with vectors, the basic units of data in machine learning. This lesson zooms out one level to matrices: rectangular grids of numbers that you can read in two equally important ways. A matrix is a table that stacks many vectors together, and it is also a machine that transforms space. Both views matter, and by the end you will see how they connect to the inner workings of a neural network.

By the end of this lesson, you will be able to:

Read a matrix as both a table of numbers and a transformation of space
Multiply two matrices by hand and verify the result with NumPy’s @ operator
Explain why matrix multiplication is not commutative and what the shape rules are
Interpret the determinant as the factor by which a matrix scales area
Use the identity matrix and the transpose, and connect a matrix multiply to a neural network layer

You should be comfortable with the vector ideas from the previous lesson (dot products in particular) and basic Python with NumPy. Let’s begin.

Two Ways to See a Matrix

A matrix is a rectangular array of numbers arranged in rows and columns. We describe its size by its shape, written rows by columns. A matrix with two rows and two columns is a $2 \times 2$ matrix; one with three rows and two columns is $3 \times 2$ .

Here is a small matrix $M$ :

M = \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix}

There are two ways to think about this object, and switching between them fluently is one of the most useful skills in machine learning math.

The first way is the table view. The matrix is just data. Each row might be one customer and each column one measurement, exactly like a spreadsheet or a pandas DataFrame. This is how you store a dataset: a matrix with one row per observation and one column per feature.

The second way is the transformation view. The matrix is a function that takes a vector in and sends a new vector out. Feed it a point in the plane and it moves that point somewhere else, stretching, rotating, or shearing the whole space in a consistent way. This is the view that explains what happens inside a model.

Both views describe the same numbers. The table view tells you what the matrix is; the transformation view tells you what the matrix does. We will spend this lesson moving between them.

Matrices stack vectors

You can read a matrix as a collection of column vectors standing side by side, or as a stack of row vectors. The matrix $M$ above is the two column vectors $\begin{bmatrix} 1 \\ 3 \end{bmatrix}$ and $\begin{bmatrix} 2 \\ 4 \end{bmatrix}$ placed next to each other. Keeping this in mind makes matrix multiplication far less mysterious.

Let’s create some matrices in NumPy. A matrix is just a two-dimensional array: a list of lists where each inner list is one row.

import numpy as np

# no dataset needed - numpy only

M = np.array([[1, 2],
              [3, 4]])

N = np.array([[5, 6],
              [7, 8]])

print("M shape:", M.shape)
print("M:\n", M)
# Output:
# M shape: (2, 2)
# M:
#  [[1 2]
#  [3 4]]

The .shape attribute reports (2, 2): two rows, two columns. Everything that follows builds on this simple structure.

Adding and Scaling Matrices

Before multiplication, the easy operations. Because a matrix is a grid of numbers, you add two matrices by adding the numbers in matching positions. This only works when both matrices have the same shape, since every element needs a partner.

M + N = \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix} + \begin{bmatrix} 5 & 6 \\ 7 & 8 \end{bmatrix} = \begin{bmatrix} 6 & 8 \\ 10 & 12 \end{bmatrix}

Multiplying a matrix by a single number, a scalar, is just as direct: multiply every element by that number.

print("M + N:\n", M + N)
# Output:
# M + N:
#  [[ 6  8]
#  [10 12]]

print("3 * M:\n", 3 * M)
# Output:
# 3 * M:
#  [[ 3  6]
#  [ 9 12]]

These element-wise operations are intuitive, and NumPy handles them with plain + and *. The interesting operation, the one that powers machine learning, is something different.

Matrix Multiplication

Matrix multiplication is not element-wise. It does not pair up matching positions and multiply them. Instead, each entry of the product comes from a dot product between a row of the first matrix and a column of the second.

This is exactly the dot product you learned in the last lesson, now applied many times. To compute the entry in row $i$ , column $j$ of the product, you take row $i$ of the left matrix, column $j$ of the right matrix, multiply them element by element, and add up the results.

Let’s work out $M N$ by hand for our two matrices:

M N = \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix} \begin{bmatrix} 5 & 6 \\ 7 & 8 \end{bmatrix}

Take it one entry at a time. The top-left entry is row 1 of $M$ dotted with column 1 of $N$ :

(1 \times 5) + (2 \times 7) = 5 + 14 = 19

The top-right entry is row 1 of $M$ dotted with column 2 of $N$ :

(1 \times 6) + (2 \times 8) = 6 + 16 = 22

The bottom-left entry is row 2 of $M$ dotted with column 1 of $N$ :

(3 \times 5) + (4 \times 7) = 15 + 28 = 43

And the bottom-right entry is row 2 of $M$ dotted with column 2 of $N$ :

(3 \times 6) + (4 \times 8) = 18 + 32 = 50

Putting the four entries together gives the answer:

M N = \begin{bmatrix} 19 & 22 \\ 43 & 50 \end{bmatrix}

Now verify it with NumPy. The @ operator performs matrix multiplication (the equivalent function is np.dot, but @ reads more clearly).

product = M @ N

print("M @ N:\n", product)
# Output:
# M @ N:
#  [[19 22]
#  [43 50]]

The numbers match your hand calculation exactly. Whenever you are unsure about a matrix product, working a couple of entries by hand and checking against @ is the fastest way to build confidence.

@ is not the same as *

In NumPy, M * N multiplies element by element (position against matching position), while M @ N performs true matrix multiplication with dot products. They give completely different results. When you mean matrix multiplication, always reach for @.

The Shape Rule

Matrix multiplication only works when the shapes line up. To compute $A B$ , the number of columns in $A$ must equal the number of rows in $B$ . The result has as many rows as $A$ and as many columns as $B$ :

(m \times n) \times (n \times p) \rightarrow (m \times p)

The two inner numbers must agree (that shared $n$ is what each dot product runs over), and they vanish from the result, leaving the two outer numbers as the new shape. If the inner numbers do not match, the product is undefined and NumPy raises an error.

Order Matters

With ordinary numbers, $3 \times 5$ and $5 \times 3$ give the same answer. Matrix multiplication is different: in general $M N \neq N M$ . Swapping the order usually changes the result, and sometimes the swapped product does not even exist because the shapes no longer line up. Try it with our matrices.

print("M @ N:\n", M @ N)
print("N @ M:\n", N @ M)
# Output:
# M @ N:
#  [[19 22]
#  [43 50]]
# N @ M:
#  [[23 34]
#  [31 46]]

The two products are clearly different. We say matrix multiplication is not commutative, and keeping track of order is essential whenever you chain transformations together.

A Matrix as a Transformation of Space

Now for the view that makes matrices come alive. Multiply a matrix by a vector and you get a new vector. If you do this to every point in the plane, the matrix reshapes the entire space, and you can watch what it does by tracking a single simple shape: the unit square, the square with corners at $(0,0)$ , $(1,0)$ , $(1,1)$ , and $(0,1)$ .

Consider this transformation matrix, a shear combined with a slight vertical squeeze:

T = \begin{bmatrix} 1.0 & 0.5 \\ 0.0 & 0.88 \end{bmatrix}

To see where the square goes, multiply $T$ by each corner. The corner $(0,0)$ stays put because $T$ times the zero vector is zero. The other corners move:

T = np.array([[1.0, 0.5],
              [0.0, 0.88]])

# The four corners of the unit square, as columns
square = np.array([[0, 1, 1, 0],
                   [0, 0, 1, 1]])

transformed = T @ square

print("Transformed corners (as columns):\n", transformed)
# Output:
# Transformed corners (as columns):
#  [[0.   1.   1.5  0.5 ]
#  [0.   0.   0.88 0.88]]

Reading the columns, the corners land at $(0,0)$ , $(1,0)$ , $(1.5, 0.88)$ , and $(0.5, 0.88)$ . The square has been pushed sideways into a leaning parallelogram. The bottom edge stays flat, the top edge slides to the right, and the whole shape becomes a touch shorter. That is what “the matrix transforms space” means in concrete terms: every point follows the same rule, so straight lines stay straight and the grid stays evenly spaced, but the whole plane is reshaped.

The unit square on the left and the same square sheared into a slanted parallelogram on the right after multiplying by the matrix T — The matrix T turns the unit square into a sheared parallelogram, mapping each corner to a new position while keeping straight lines straight.

This is the key intuition to carry forward: a matrix is a recipe for moving every point in space at once. Stretching, rotating, reflecting, and shearing are all just different matrices.

The Determinant: How Much Space Changes

The transformation reshaped the unit square, which originally had an area of exactly 1. How big is the new parallelogram? The answer is captured by a single number called the determinant.

The determinant of a square matrix tells you the factor by which the matrix scales area (in two dimensions) or volume (in higher dimensions). A determinant of 1 means area is preserved, a determinant of 2 means areas double, and a determinant between 0 and 1 means the shape shrinks.

For a $2 \times 2$ matrix, the determinant has a simple formula:

\det \begin{bmatrix} a & b \\ c & d \end{bmatrix} = ad - bc

For our shear matrix $T$ , plug in the values $a = 1.0$ , $b = 0.5$ , $c = 0.0$ , $d = 0.88$ :

\det(T) = (1.0 \times 0.88) - (0.5 \times 0.0) = 0.88

NumPy computes determinants with np.linalg.det.

det_T = np.linalg.det(T)

print("det(T):", round(det_T, 3))
# Output:
# det(T): 0.88

The determinant is $0.880$ , so the transformed parallelogram has about 88 percent of the area of the original unit square. The shear pushed the square sideways but the slight vertical squeeze shrank its area a little. This matches the picture: the parallelogram is the same width along the base but a bit shorter than the square.

When the determinant is zero

If the determinant is $0$ , the matrix collapses space onto a line or a single point, flattening all area to nothing. Such a matrix throws information away and cannot be undone, which is why a zero determinant signals a matrix that is not invertible. You will meet invertibility properly in a later lesson; for now, just read the determinant as an area scale factor.

The Identity and the Transpose

Two special operations come up constantly, and both are easy once you have the transformation view.

The Identity Matrix

The identity matrix is the matrix that changes nothing. It has $1$ running down the main diagonal and $0$ everywhere else. The $2 \times 2$ identity is written $I$ :

I = \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix}

Multiplying any matrix or vector by the identity leaves it untouched, the same way multiplying a number by 1 leaves it unchanged. In the transformation view, the identity is the “do nothing” transformation: every point stays exactly where it is, and its determinant is 1 because area is preserved.

I = np.identity(2)

print("I:\n", I)
print("M @ I:\n", M @ I)
# Output:
# I:
#  [[1. 0.]
#  [0. 1.]]
# M @ I:
#  [[1. 2.]
#  [3. 4.]]

Multiplying $M$ by $I$ returns $M$ exactly. The identity is the reference point for “no change,” and it is the goal you aim for when you try to undo a transformation.

The Transpose

The transpose flips a matrix across its main diagonal, turning rows into columns and columns into rows. The transpose of $A$ is written $A^{T}$ . If $A$ has shape $m \times n$ , then $A^{T}$ has shape $n \times m$ .

M = \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix} \qquad M^{T} = \begin{bmatrix} 1 & 3 \\ 2 & 4 \end{bmatrix}

The first row $[1, 2]$ becomes the first column, and the second row $[3, 4]$ becomes the second column. In NumPy, the transpose is the .T attribute.

print("M.T:\n", M.T)
# Output:
# M.T:
#  [[1 3]
#  [2 4]]

The transpose is more than a formatting trick. It shows up whenever you need to make shapes line up for multiplication, and it appears throughout the formulas of linear regression and neural network training. A useful rule to remember is that the transpose of a product reverses the order: $(AB)^{T} = B^{T} A^{T}$ .

Connecting to Machine Learning

Everything in this lesson converges on one idea that sits at the heart of modern machine learning: a layer of a neural network is a matrix multiply.

Picture a single layer that takes an input vector $\mathbf{x}$ of feature values and produces an output vector. The layer stores its learned weights in a matrix $W$ . The core computation is:

\mathbf{z} = W \mathbf{x} + \mathbf{b}

That is a matrix multiplied by a vector (exactly the operation you have been practicing) plus a bias vector $\mathbf{b}$ that shifts the result. The weight matrix $W$ is precisely a transformation of space: it stretches, rotates, and shears the input vector into a new representation. Stack several such layers and you are composing transformations, one after another, which is itself just more matrix multiplication.

This is why matrices matter so much. When a model “learns,” it is adjusting the numbers inside its weight matrices so that the transformations they apply pull useful structure out of the data. And when training runs fast on a GPU, it is because hardware is built to do enormous matrix multiplications in parallel.

# A tiny "layer": transform a 2-feature input with a weight matrix
W = np.array([[0.5, -0.2],
              [0.1,  0.8]])
x = np.array([2.0, 3.0])
b = np.array([1.0, -1.0])

z = W @ x + b
print("Layer output shape:", z.shape)
print("Layer output:", z)
# Output:
# Layer output shape: (2,)
# Layer output: [1.4 1.6]

The exact numbers here come from arithmetic you can check: the first output is $(0.5 \times 2.0) + (-0.2 \times 3.0) + 1.0 = 1.0 - 0.6 + 1.0 = 1.4$ , and the second is $(0.1 \times 2.0) + (0.8 \times 3.0) - 1.0 = 0.2 + 2.4 - 1.0 = 1.6$ . A real network does the same thing with far larger matrices, but the operation is identical to what you just ran.

Practice Exercises

Now it is your turn. Try these before checking the hints.

Exercise 1: Multiply Two Matrices by Hand, Then Verify

Compute the product of these two matrices by hand, one entry at a time, then confirm your answer with NumPy’s @ operator.

A = \begin{bmatrix} 2 & 0 \\ 1 & 3 \end{bmatrix} \qquad B = \begin{bmatrix} 1 & 4 \\ 2 & 1 \end{bmatrix}

import numpy as np

A = np.array([[2, 0], [1, 3]])
B = np.array([[1, 4], [2, 1]])

# Your code here: compute A @ B

Hint

Each entry is a row of $A$ dotted with a column of $B$ . The top-left entry is $(2 \times 1) + (0 \times 2) = 2$ . Work out the other three the same way, then check with print(A @ B). You should get [[2, 8], [7, 7]].

Exercise 2: Read a Determinant as Area Scaling

Create the matrix below in NumPy, compute its determinant with np.linalg.det, and explain in one sentence what it tells you about how this matrix changes area.

S = \begin{bmatrix} 2 & 0 \\ 0 & 3 \end{bmatrix}

import numpy as np

# Your code here

Hint

Use np.linalg.det(S). By the formula $ad - bc$ , the determinant is $(2 \times 3) - (0 \times 0) = 6$ . This matrix stretches the plane by 2 horizontally and 3 vertically, so it scales every area by a factor of 6.

Exercise 3: Confirm the Transpose Rule for Products

Using the matrices $M$ and $N$ from this lesson, verify the rule $(M N)^{T} = N^{T} M^{T}$ . Compute both sides and check that they are equal.

import numpy as np

M = np.array([[1, 2], [3, 4]])
N = np.array([[5, 6], [7, 8]])

# Your code here: compare (M @ N).T with N.T @ M.T

Hint

Compute left = (M @ N).T and right = N.T @ M.T, then compare them with np.array_equal(left, right), which should print True. Notice that the order flips: it is $N^{T} M^{T}$ , not $M^{T} N^{T}$ .

Summary

You have learned to read matrices in two ways and to perform the operations that drive machine learning under the hood. Let’s review what you covered.

Key Concepts

Matrices as Tables and Transformations

A matrix is a rectangular grid of numbers described by its shape (rows by columns)
The table view treats a matrix as data: rows are observations, columns are features
The transformation view treats a matrix as a function that moves every point in space

Matrix Operations

Addition and scalar multiplication act element by element and require matching shapes
Matrix multiplication fills each entry with a dot product of a row and a column
The shape rule: $(m \times n) \times (n \times p) \rightarrow (m \times p)$ ; the inner numbers must match
Matrix multiplication is not commutative: in general $MN \neq NM$
Use @ (not *) in NumPy for true matrix multiplication

Geometry and Special Matrices

A matrix reshapes the unit square into a parallelogram; tracking corners shows what it does
The determinant is the factor by which a matrix scales area; $\det(T) = 0.880$ for the shear shown
For a $2 \times 2$ matrix, $\det = ad - bc$ ; a determinant of $0$ collapses space and cannot be undone
The identity matrix $I$ changes nothing; the transpose $A^{T}$ swaps rows and columns

The Machine Learning Connection

A neural network layer computes $\mathbf{z} = W \mathbf{x} + \mathbf{b}$ , a matrix multiply plus a shift
Learning means adjusting the numbers inside weight matrices, which are transformations of space

Why This Matters

Matrices are the language data is written in and the engine that models run on. Every dataset you load is a matrix, every layer of a neural network is a matrix multiply, and the speed of modern deep learning comes from hardware built to multiply matrices at scale. The geometric view you practiced here, seeing a matrix bend and stretch space and reading the determinant as an area scale factor, is what lets you reason about what a model is actually doing instead of treating it as a black box. When you later study how models are trained, these same operations will reappear in the formulas for predictions and gradients.

Next Steps

You can now multiply matrices, read their geometry, and connect them to neural network layers. Next, you will return to systems of equations and learn when a set of vectors truly carries independent information, which determines whether a system has one solution, none, or infinitely many.

Continue to Lesson 7 - Solution Sets and Linear Independence

Learn how solution sets are described and what it means for vectors to be linearly independent.

Back to Module Overview

Return to the Math Foundations module overview.

Keep Building Your Skills

You have taken a big step by learning to see a matrix as both a table and a transformation. That dual view is exactly how practitioners think: a dataset is a matrix to be processed, and a model is a stack of matrices that reshape data into predictions. Keep the unit-square picture in mind as you go forward. Whenever you meet a new matrix operation, ask yourself what it does to space, and the math will stay grounded in something you can actually see.

Lesson 5 - Vectors

Lesson 7 - Solution Sets and Linear Independence

Courses

DATATWEETS

Title here

Lesson 6 - Matrix Algebra

Welcome to Matrix Algebra

Two Ways to See a Matrix

Adding and Scaling Matrices

Matrix Multiplication

The Shape Rule

Order Matters

A Matrix as a Transformation of Space

The Determinant: How Much Space Changes

The Identity and the Transpose

The Identity Matrix

The Transpose

Connecting to Machine Learning

Practice Exercises

Exercise 1: Multiply Two Matrices by Hand, Then Verify

Exercise 2: Read a Determinant as Area Scaling

Exercise 3: Confirm the Transpose Rule for Products

Summary

Key Concepts

Why This Matters

Next Steps

Continue to Lesson 7 - Solution Sets and Linear Independence

Back to Module Overview

Keep Building Your Skills

Lesson 6 - Matrix Algebra

Welcome to Matrix Algebra#

Two Ways to See a Matrix#

Adding and Scaling Matrices#

Matrix Multiplication#

The Shape Rule#

Order Matters#

A Matrix as a Transformation of Space#

The Determinant: How Much Space Changes#

The Identity and the Transpose#

The Identity Matrix#

The Transpose#

Connecting to Machine Learning#

Practice Exercises#

Exercise 1: Multiply Two Matrices by Hand, Then Verify#

Exercise 2: Read a Determinant as Area Scaling#

Exercise 3: Confirm the Transpose Rule for Products#

Summary#

Key Concepts#

Why This Matters#

Next Steps#

Continue to Lesson 7 - Solution Sets and Linear Independence

Back to Module Overview

Keep Building Your Skills#

Welcome to Matrix Algebra

Two Ways to See a Matrix

Adding and Scaling Matrices

Matrix Multiplication

The Shape Rule

Order Matters

A Matrix as a Transformation of Space

The Determinant: How Much Space Changes

The Identity and the Transpose

The Identity Matrix

The Transpose

Connecting to Machine Learning

Practice Exercises

Exercise 1: Multiply Two Matrices by Hand, Then Verify

Exercise 2: Read a Determinant as Area Scaling

Exercise 3: Confirm the Transpose Rule for Products

Summary

Key Concepts

Why This Matters

Next Steps

Keep Building Your Skills