Lesson 5 - Vectors

Welcome to Vectors

This lesson introduces the vector, the single most important data structure in machine learning. Almost everything a model touches, from a single customer record to the weights it learns, is a vector. You will learn how to represent vectors, combine them with addition and scaling, and measure them with the dot product and magnitude, all while building the geometric intuition that makes these operations feel natural.

By the end of this lesson, you will be able to:

  • Represent a vector as an ordered list of numbers and as an arrow in space
  • Add and subtract vectors and multiply a vector by a scalar
  • Compute the dot product of two vectors and explain what it produces
  • Compute the magnitude (length) of a vector with the Euclidean norm
  • Connect the dot product to the angle between vectors and to the idea of similarity in machine learning

You should be comfortable with basic Python and NumPy and with the coordinate plane from earlier lessons. No prior linear algebra is needed. Let’s begin.


What Is a Vector?

A vector is simply an ordered list of numbers. That is the whole definition. The vector

a=[21] \vec{a} = \begin{bmatrix} 2 \\ 1 \end{bmatrix}

holds two numbers: a 2 in the first position and a 1 in the second. The order matters, because [21] \begin{bmatrix} 2 \\ 1 \end{bmatrix} is a different vector from [12] \begin{bmatrix} 1 \\ 2 \end{bmatrix} . The number of entries is called the dimension of the vector, so a \vec{a} is a two-dimensional vector. We write it as a column to follow the standard convention in linear algebra, but it represents the same information as the horizontal list [2, 1].

In machine learning, a vector is how you describe a single example. Picture a dataset of houses. One house might be described by its size, number of bedrooms, and age:

x=[1800312] \vec{x} = \begin{bmatrix} 1800 \\ 3 \\ 12 \end{bmatrix}

That bundle of three numbers is a feature vector: one observation, with each entry holding one feature. Every row of a dataset is a vector, and so is the set of weights a model learns. Once you are fluent with vector operations, you are fluent with the language models actually speak.

Vectors as Arrows

When a vector has two or three entries, you can draw it. The trick is that a vector is not a single point; it is a direction and a length. By convention you draw it as an arrow starting at the origin (0,0) (0, 0) and ending at the coordinates the vector names. So a=[21] \vec{a} = \begin{bmatrix} 2 \\ 1 \end{bmatrix} is an arrow from the origin to the point (2,1) (2, 1) .

An arrow captures the two properties that define every vector:

  • Direction: the way the arrow points.
  • Magnitude: how long the arrow is.

Two arrows that point the same way but have different lengths are different vectors, and so are two arrows of the same length pointing different ways. The figure below shows two vectors, a=[21] \vec{a} = \begin{bmatrix} 2 \\ 1 \end{bmatrix} and b=[13] \vec{b} = \begin{bmatrix} 1 \\ 3 \end{bmatrix} , each drawn from the origin, along with their sum (which we will reach shortly).

Two vectors a=[2,1] and b=[1,3] drawn from the origin, with their sum a+b=[3,4]
Vectors are arrows from the origin; adding them tip-to-tail gives the diagonal of the parallelogram they form.

Row vs column, the same thing

You will see vectors written both as a horizontal row [2, 1] and as a vertical column. Mathematically they carry the same numbers in the same order. The column form is traditional in linear algebra because it lines up cleanly with matrix multiplication, which you will meet in the next lesson. In NumPy, a flat array like np.array([2, 1]) is the most convenient representation for the work in this lesson.


Creating Vectors in NumPy

You represent vectors in Python with NumPy arrays. A one-dimensional array is the natural choice: it is just a flat list of numbers.

import numpy as np

a = np.array([2.0, 1.0])
b = np.array([1.0, 3.0])

print("a =", a)
print("b =", b)
print("shape of a:", a.shape)
# Output:
# a = [2. 1.]
# b = [1. 3.]
# shape of a: (2,)

The shape (2,) tells you a is a one-dimensional array with two entries. Using floating-point literals like 2.0 keeps NumPy in floating-point arithmetic, which matters once you start computing lengths and angles that are rarely whole numbers.

These two vectors, a and b, will carry us through the entire lesson. Every operation below uses them, so you can check each result against the figure above.


Vector Addition and Subtraction

You add two vectors by adding their entries position by position. The first entry of the result is the sum of the first entries, the second is the sum of the second entries, and so on. For our vectors:

a+b=[21]+[13]=[2+11+3]=[34] \vec{a} + \vec{b} = \begin{bmatrix} 2 \\ 1 \end{bmatrix} + \begin{bmatrix} 1 \\ 3 \end{bmatrix} = \begin{bmatrix} 2 + 1 \\ 1 + 3 \end{bmatrix} = \begin{bmatrix} 3 \\ 4 \end{bmatrix}

Geometrically, addition is tip-to-tail. Slide the start of b \vec{b} to the tip of a \vec{a} ; the sum is the arrow from the original origin to where b \vec{b} now ends. That is exactly the diagonal arrow reaching (3,4) (3, 4) in the figure above. Because you can add the two vectors in either order and land in the same place, they trace out a parallelogram, and the sum is its diagonal.

Subtraction works the same way, position by position:

ba=[1231]=[12] \vec{b} - \vec{a} = \begin{bmatrix} 1 - 2 \\ 3 - 1 \end{bmatrix} = \begin{bmatrix} -1 \\ 2 \end{bmatrix}

In NumPy, the + and - operators do this elementwise, so the code reads exactly like the math.

print("a + b =", a + b)
print("b - a =", b - a)
# Output:
# a + b = [3. 4.]
# b - a = [-1.  2.]

You can only add or subtract vectors of the same dimension. Trying to add a two-entry vector to a three-entry vector is undefined, and NumPy will raise an error if you try.

Why addition matters in ML

Vector addition is everywhere in machine learning. When a model updates its weights during training, it adds a small correction vector to the current weight vector. When you average a batch of feature vectors, you add them up and scale the result. The humble elementwise sum is the workhorse of model training.


Scalar Multiplication

A scalar is just an ordinary number, as opposed to a vector. Multiplying a vector by a scalar stretches or shrinks it: every entry is multiplied by that number.

3a=3[21]=[63] 3 \cdot \vec{a} = 3 \cdot \begin{bmatrix} 2 \\ 1 \end{bmatrix} = \begin{bmatrix} 6 \\ 3 \end{bmatrix}

The direction stays the same; only the length changes. Multiplying by a number greater than 1 lengthens the arrow, a number between 0 and 1 shrinks it, and a negative number flips it to point the opposite way.

print("3 * a =", 3 * a)
print("0.5 * a =", 0.5 * a)
print("-1 * a =", -1 * a)
# Output:
# 3 * a = [6. 3.]
# 0.5 * a = [1.  0.5]
# -1 * a = [-1. -0.5]

Scalar multiplication and vector addition together let you build new vectors out of old ones. Scaling several vectors and adding the results is called a linear combination, and it is the foundation of nearly everything in linear algebra. We will not go further with combinations here, but keep the idea in mind: addition plus scaling is a surprisingly powerful toolkit.


The Dot Product

The operations so far returned new vectors. The dot product is different: it takes two vectors and returns a single number. You compute it by multiplying matching entries and summing the products.

ab=i=1naibi=a1b1+a2b2++anbn \vec{a} \cdot \vec{b} = \sum_{i=1}^{n} a_i b_i = a_1 b_1 + a_2 b_2 + \dots + a_n b_n

For our vectors, with two entries each:

ab=(2)(1)+(1)(3)=2+3=5 \vec{a} \cdot \vec{b} = (2)(1) + (1)(3) = 2 + 3 = 5

So the dot product of a \vec{a} and b \vec{b} is the scalar 5. Both vectors must have the same dimension, since every entry needs a partner. NumPy computes this with np.dot, or with the @ operator, which is the modern shorthand.

dot = np.dot(a, b)
print("a . b =", dot)
print("a @ b =", a @ b)   # same thing
# Output:
# a . b = 5.0
# a @ b = 5.0

That single number is doing a lot of quiet work. In machine learning, a model’s prediction for one example is very often a dot product: take the feature vector, take the weight vector, multiply matching entries, and sum them. Linear regression and the score inside logistic regression and neural networks all start with exactly this operation. Learning the dot product means learning how models combine inputs into a decision.

The dot product returns a scalar

This is the one vector operation in this lesson that does not return a vector. Addition, subtraction, and scaling all give you back a vector of the same shape. The dot product collapses two vectors into a single number, which is precisely why it is so useful for producing a score or a measure of similarity.


Magnitude: The Length of a Vector

The magnitude of a vector, also called its norm, is the length of its arrow. For a vector a \vec{a} with entries a1,a2,,an a_1, a_2, \dots, a_n , the Euclidean norm is

a=a12+a22++an2 \lVert \vec{a} \rVert = \sqrt{a_1^2 + a_2^2 + \dots + a_n^2}

This is the Pythagorean theorem in disguise. The entries are the legs of a right triangle, and the magnitude is the hypotenuse. For our vector a=[21] \vec{a} = \begin{bmatrix} 2 \\ 1 \end{bmatrix} :

a=22+12=4+1=52.2361 \lVert \vec{a} \rVert = \sqrt{2^2 + 1^2} = \sqrt{4 + 1} = \sqrt{5} \approx 2.2361

There is a neat connection to the dot product: a vector dotted with itself gives the sum of its squared entries, so the magnitude is the square root of that self dot product. NumPy gives you the norm directly through np.linalg.norm.

mag_a = np.linalg.norm(a)
print("|a| =", round(mag_a, 4))

# the same value, via the dot product
print("sqrt(a . a) =", round(np.sqrt(np.dot(a, a)), 4))
# Output:
# |a| = 2.2361
# sqrt(a . a) = 2.2361

Magnitude matters because many machine learning techniques care about the size of a vector. Regularization, for instance, penalizes large weight vectors to keep a model simple, and it measures “large” with exactly this norm. Scaling features, which you met in the foundations module, is partly about controlling these magnitudes so no single feature dominates.


The Geometry of the Dot Product

Here is where the dot product becomes intuitive. It is tied directly to the angle between two vectors through this relationship:

ab=abcosθ \vec{a} \cdot \vec{b} = \lVert \vec{a} \rVert \, \lVert \vec{b} \rVert \cos\theta

where θ \theta is the angle between the two arrows. Rearranging gives a formula for the cosine of that angle:

cosθ=abab \cos\theta = \frac{\vec{a} \cdot \vec{b}}{\lVert \vec{a} \rVert \, \lVert \vec{b} \rVert}

Let’s compute it for our vectors. We already know ab=5 \vec{a} \cdot \vec{b} = 5 and a=5 \lVert \vec{a} \rVert = \sqrt{5} . The magnitude of b \vec{b} is 12+32=10 \sqrt{1^2 + 3^2} = \sqrt{10} . So

cosθ=5510=5500.7071 \cos\theta = \frac{5}{\sqrt{5} \cdot \sqrt{10}} = \frac{5}{\sqrt{50}} \approx 0.7071

A cosine of about 0.7071 corresponds to an angle of 45 degrees, which you can confirm by eye in the figure: the two arrows really are about a quarter-turn apart. Let’s verify the whole computation in NumPy.

cos_theta = np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))
print("cos(angle) =", round(cos_theta, 4))

angle_degrees = np.degrees(np.arccos(cos_theta))
print("angle =", round(angle_degrees, 2), "degrees")
# Output:
# cos(angle) = 0.7071
# angle = 45.0 degrees

This is the bridge between algebra and geometry. The dot product is not an arbitrary rule; it encodes how aligned two vectors are.

The Dot Product as Similarity

Look at what the cosine tells you about direction:

  • When two vectors point the same way, θ=0 \theta = 0 , cosθ=1 \cos\theta = 1 , and the dot product is at its largest.
  • When they are perpendicular, θ=90 \theta = 90^\circ , cosθ=0 \cos\theta = 0 , and the dot product is exactly zero.
  • When they point in opposite directions, θ=180 \theta = 180^\circ , cosθ=1 \cos\theta = -1 , and the dot product is at its most negative.

So the dot product is a measure of alignment, or similarity. A large positive value means two vectors point in similar directions; zero means they share nothing in common directionally; a negative value means they oppose each other. Our value of 0.7071 says a \vec{a} and b \vec{b} are fairly well aligned but not identical.

This is the heart of cosine similarity, one of the most widely used similarity measures in machine learning. Recommendation systems compare a user’s preference vector against item vectors with the dot product. Search engines rank documents by how well their word vectors align with your query. Modern language models compare meaning by taking dot products between embedding vectors. Every one of these is the geometry you just computed by hand.

Magnitude affects the raw dot product

The raw dot product mixes direction and length together: a long vector can produce a large dot product even when it points in a so-so direction. That is why similarity tasks usually use cosine similarity, which divides out both magnitudes and keeps only the angle. If you compare vectors of very different lengths, normalize first, or you will measure size when you meant to measure direction.


Putting It All Together

Here is every operation from this lesson in one short, runnable script. It is a compact reference you can return to whenever you need to remember how a vector operation works.

import numpy as np

a = np.array([2.0, 1.0])
b = np.array([1.0, 3.0])

print("a + b      =", a + b)                       # addition
print("3 * a      =", 3 * a)                        # scalar multiplication
print("a . b      =", np.dot(a, b))                 # dot product (a scalar)
print("|a|        =", round(np.linalg.norm(a), 4))  # magnitude

cos_theta = np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))
print("cos(angle) =", round(cos_theta, 4))          # similarity
# Output:
# a + b      = [3. 4.]
# 3 * a      = [6. 3.]
# a . b      = 5.0
# |a|        = 2.2361
# cos(angle) = 0.7071

In a dozen lines you represented vectors, combined them, and measured both their size and their alignment. These five operations are the alphabet of machine learning math.


Practice Exercises

Now it is your turn. Try these before checking the hints.

Exercise 1: Add and Scale

Create the vectors u = [4, 0] and v = [1, 2] in NumPy. Compute u + v, then compute the linear combination 2 * u + 3 * v, and print both results.

import numpy as np

# Your code here

Hint

Build the arrays with np.array([4.0, 0.0]) and np.array([1.0, 2.0]). Use the + operator for the sum and combine 2 * u with 3 * v for the linear combination. You should get u + v = [5. 2.] and 2 * u + 3 * v = [11. 6.].

Exercise 2: Dot Product and Magnitude

Using the same u = [4, 0] and v = [1, 2], compute their dot product with np.dot, and compute the magnitude of v with np.linalg.norm. Print both, rounding the magnitude to four decimal places.

# Your code here (reuse u and v from Exercise 1)

Hint

The dot product is (4)(1) + (0)(2) = 4. The magnitude of v is sqrt(1^2 + 2^2) = sqrt(5), about 2.2361. Use np.dot(u, v) and round(np.linalg.norm(v), 4).

Exercise 3: Perpendicular Vectors

The vectors p = [3, 0] and q = [0, 5] point along the two axes. Without running anything, predict their dot product and the angle between them. Then write code to compute the dot product and the cosine of the angle to check your prediction.

# Your code here

Hint

Vectors that lie along perpendicular axes are at right angles, so the angle is 90 degrees and cos(90) = 0. The dot product is (3)(0) + (0)(5) = 0. Compute np.dot(p, q) and divide by the product of the norms to confirm you get 0.0.


Summary

Congratulations! You have learned the core data structure of machine learning and every operation that defines it. Let’s review what you covered.

Key Concepts

What a Vector Is

  • A vector is an ordered list of numbers; the count of entries is its dimension
  • A vector can be drawn as an arrow from the origin, capturing direction and magnitude
  • In machine learning, each data observation is a feature vector, one number per feature

Combining Vectors

  • Addition and subtraction work entry by entry, and only on vectors of equal dimension
  • Addition is geometrically tip-to-tail: a+b \vec{a} + \vec{b} is the diagonal of their parallelogram
  • Scalar multiplication stretches or shrinks a vector, changing length but not direction (unless the scalar is negative)

Measuring Vectors

  • The dot product multiplies matching entries and sums them, returning a single scalar
  • For a=[2,1] \vec{a} = [2, 1] and b=[1,3] \vec{b} = [1, 3] , the dot product is 5
  • The magnitude (norm) is the length of the arrow, a=a12+a22+ \lVert \vec{a} \rVert = \sqrt{a_1^2 + a_2^2 + \dots} ; for a \vec{a} it is about 2.2361

Geometry and Similarity

  • The dot product relates to the angle by ab=abcosθ \vec{a} \cdot \vec{b} = \lVert \vec{a} \rVert \lVert \vec{b} \rVert \cos\theta
  • For our vectors, cosθ0.7071 \cos\theta \approx 0.7071 , an angle of 45 degrees
  • A larger dot product means more aligned vectors, which is the basis of cosine similarity

NumPy Tools

  • np.array([2.0, 1.0]) creates a vector
  • a + b, 3 * a do elementwise addition and scaling
  • np.dot(a, b) or a @ b computes the dot product
  • np.linalg.norm(a) computes the magnitude

Why This Matters

Vectors are not an abstract detour on the way to machine learning; they are the substance of it. Every row of your data, every set of weights a model learns, and every embedding produced by a neural network is a vector. The operations you practiced here are the operations models perform millions of times during training: they add correction vectors to weights, they scale gradients, and above all they take dot products to turn inputs into predictions.

The dot product deserves special attention. It is simultaneously the engine of linear models and a measure of similarity, and that dual role is why it shows up in recommendation systems, search ranking, and language models alike. When you understand that a dot product measures how aligned two vectors are, a surprising amount of machine learning stops looking like magic and starts looking like geometry. The next lesson extends these ideas from single vectors to whole tables of them with matrix algebra.


Next Steps

You now know how to represent, combine, and measure vectors. In the next lesson you will scale up from individual vectors to matrices, learning how to multiply them and what those operations do geometrically.

Continue to Lesson 6 - Matrix Algebra

Move from single vectors to matrices and learn how matrix multiplication transforms space.

Back to Module Overview

Return to the Math Foundations module overview.


Keep Building Your Skills

You have just learned the alphabet of machine learning math. Vectors and their operations look simple, but they recur at every scale, from a single data point to the billions of parameters inside a large model. As you move into matrices and beyond, keep returning to the geometric pictures from this lesson: an arrow with direction and length, a tip-to-tail sum, a dot product measuring alignment. Hold onto that intuition, and the more advanced math ahead will feel like a natural extension of what you already understand.