Lesson 2 - Generating Embeddings

Welcome to Generating Embeddings

In Lesson 1 you saw what an embedding is and why nearby vectors mean similar text. Now you’ll make them yourself. The good news: you don’t need an API key, a credit card, or an internet connection after the first run. A small open model called all-MiniLM-L6-v2 runs right on your machine, turning any string into a 384-number vector in milliseconds. By the end of this lesson you’ll have generated real embeddings, encoded a whole batch at once, and confirmed exactly what the model hands back.

By the end of this lesson, you will be able to:

  • Install sentence-transformers and load an embedding model
  • Encode a single string into a 384-dimensional vector
  • Encode a list of strings in one batch and read the resulting 2D array
  • Inspect a vector’s shape and confirm it is L2-normalized

Everything here runs locally and for free. Let’s get set up.


Installing and Loading the Model

The sentence-transformers library wraps a collection of ready-to-use embedding models behind a single, clean interface. Install it with pip:

pip install sentence-transformers

That one command pulls in everything you need, including PyTorch under the hood. Once it finishes, loading a model is a single line:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("all-MiniLM-L6-v2")
print(model)
SentenceTransformer(
  (0): Transformer({'max_seq_length': 256, ...})
  (1): Pooling({'word_embedding_dimension': 384, ...})
  (2): Normalize()
)

The string "all-MiniLM-L6-v2" is the model’s name. The first time you run this, the library downloads the model’s weights (roughly 90 MB) and caches them on disk. Every run after that loads from the cache, so it’s fast and works completely offline. Notice the three stages printed above: a Transformer that reads the text, a Pooling layer that compresses it into one fixed-length vector, and a Normalize layer — that last one becomes important shortly.

The model downloads once, then runs offline

The first call to SentenceTransformer("all-MiniLM-L6-v2") downloads the weights to a local cache (typically under ~/.cache/). After that, the model loads from disk on every run — no network, no API key, no per-request cost. You can run thousands of embeddings without spending a cent.


Encoding a Single String

To turn text into a vector, call model.encode() with a string. The model returns a NumPy array:

import numpy as np

vector = model.encode("The cat sat on the mat.")

print("type: ", type(vector).__name__)
print("shape:", vector.shape)
print("first 5 numbers:", vector[:5].round(4).tolist())
type:  ndarray
shape: (384,)
first 5 numbers: [0.1302, -0.0158, -0.0367, 0.058, -0.0598]

That’s a real embedding. The shape (384,) tells you it’s a one-dimensional array of 384 numbers — the fixed dimensionality of this model. As you saw in Lesson 1, the individual values don’t mean anything on their own; it’s the overall pattern that encodes meaning. The same call works on a single word, a sentence, or a whole paragraph: the output is always 384 numbers, which is exactly what lets you compare any two pieces of text later.


Vectors Are L2-Normalized

Remember the Normalize() layer in the model’s structure? It guarantees something useful: every vector this model produces has a length (L2 norm) of 1.0. In other words, each embedding is a point on a unit sphere — only its direction carries meaning, not its magnitude. You can verify this directly:

norm = np.linalg.norm(vector)
print("L2 norm:", round(float(norm), 4))
L2 norm: 1.0

A norm of exactly 1.0 confirms the vector is normalized. This isn’t just a curiosity — it makes the similarity math in the next lesson cleaner and faster, because for unit vectors, cosine similarity reduces to a plain dot product. Not every embedding model normalizes its output, but all-MiniLM-L6-v2 does, and you can always check with np.linalg.norm.


Encoding a Batch of Strings

Embedding one string at a time works, but it’s slow when you have many. Pass encode() a list of strings and the model processes them together in a single batch — far more efficient — and returns a 2D array with one row per input:

sentences = [
    "How do I reset my password?",
    "I forgot my login credentials and can't sign in.",
    "What's the weather like in Paris today?",
]

embeddings = model.encode(sentences)

print("type: ", type(embeddings).__name__)
print("shape:", embeddings.shape)
print("row norms:", np.linalg.norm(embeddings, axis=1).round(4).tolist())
type:  ndarray
shape: (3, 384)
row norms: [1.0, 1.0, 1.0]

The shape (3, 384) reads as “3 rows of 384 numbers each” — one embedding per sentence, stacked into a matrix. embeddings[0] is the vector for the first sentence, embeddings[1] for the second, and so on. Each row is still individually normalized to length 1.0. This batch shape is the natural format for everything that follows: it’s exactly what you’ll feed into similarity calculations in Lesson 3 and into a search index in Lesson 4. Whenever you’re embedding a collection of documents, hand the whole list to encode() at once rather than looping.


A Note on Hosted Embedding APIs

You’re running embeddings locally because it’s free, private, and more than good enough to learn on — and it will stay your default throughout this course. But it’s worth knowing the alternative exists. Some providers offer hosted embedding APIs: you send text over the network and get vectors back, with no model to download or run. For example, Voyage AI provides managed models such as voyage-3 that are larger than all-MiniLM-L6-v2, output higher-dimensional vectors, and tend to score better on retrieval benchmarks — at the cost of an API key and per-token billing.

One thing to know specifically for this course: Anthropic does not offer a first-party embeddings API, and instead recommends third-party options. That’s exactly why we reach for a local model here — it keeps the workflow simple, free, and fully under your control. If you later build a production search system over a large or specialized corpus, a hosted model can be worth the cost; for learning the concepts and building working prototypes, local embeddings are ideal.


Practice Exercises

Exercise 1: Encode your own sentence

Install sentence-transformers, load all-MiniLM-L6-v2, and encode a sentence of your choosing. Print the array’s shape and confirm it is (384,). Then print the first three numbers.

Hint

After model = SentenceTransformer("all-MiniLM-L6-v2"), call v = model.encode("your sentence here"). Use v.shape for the dimensionality and v[:3] to slice the first three values. The shape will be (384,) no matter how long your sentence is.

Exercise 2: Batch-encode a list

Put four short sentences in a Python list and pass the list to model.encode() in one call. Print the shape of the result and explain what each number in the shape means.

Hint

Passing a list returns a 2D array. With four sentences the shape is (4, 384): the first number is the count of sentences (rows), the second is the embedding dimension (columns). Row i is the embedding for sentence i.

Exercise 3: Check the norm

Take any single embedding you generated and compute its L2 norm with numpy. Confirm it rounds to 1.0, then explain in one sentence why that is true for this model.

Hint

Use np.linalg.norm(vector). It returns about 1.0 because all-MiniLM-L6-v2 ends with a Normalize layer that scales every output vector to unit length — so only its direction matters, not its magnitude.


Summary

You generated real embeddings on your own machine. After pip install sentence-transformers, loading all-MiniLM-L6-v2 takes one line, and the model is cached after its first download so later runs are fast and offline. Calling model.encode() on a single string returns a 384-number NumPy vector with shape (384,); calling it on a list returns a 2D array with one row per input — shape (3, 384) for three sentences. Every vector this model produces is L2-normalized to a length of exactly 1.0, which you confirmed with np.linalg.norm. Hosted APIs like Voyage AI exist for larger production needs, but local embeddings are free, private, and the right tool for this course.

Key Concepts

  • sentence-transformers — the library that loads and runs open embedding models locally.
  • SentenceTransformer("all-MiniLM-L6-v2") — loads a 384-dimensional model, downloading and caching it on first use.
  • model.encode(text) — returns a 1D array for a string, or a 2D array (one row per item) for a list.
  • .shape(384,) for one string, (n, 384) for a batch of n strings.
  • L2 normalization — every vector has length 1.0; only direction carries meaning.
  • Hosted embedding APIs — managed options like Voyage AI’s voyage-3 for larger-scale or production use.

Why This Matters

Generating embeddings is the gateway to everything else in this module and the modules that follow. Once you can turn any text into a comparable vector — one item or a whole batch at a time — you have the raw material for measuring similarity, building semantic search, and powering retrieval-augmented generation. The local, no-key workflow you set up here means you can experiment freely without limits or cost.


Next Steps

Continue to Lesson 3 - Measuring Similarity and Distance

Turn embeddings into scores: compute cosine similarity and distance to find which texts are closest in meaning.

Back to Module Overview

Return to the Embeddings & Semantic Search module overview


Continue Building Your Skills

You can now generate embeddings on demand — single strings, full batches, and the normalized 384-dimensional vectors that make them comparable. Next you’ll put those vectors to work: measuring how similar two pieces of text really are, so “find related text” becomes a number you can rank.