Lesson 4 - Guided Project: Semantic Search
Welcome to the Semantic Search Project
This is the capstone for the module. Over the last three lessons you learned what an embedding is, how to turn many texts into vectors in a single batch, and how cosine similarity measures meaning rather than wording. Now you will put all of it to work and build something genuinely useful: a search engine that finds the right support answer even when the user’s question shares no words with it. Your corpus is a real set of customer-support FAQs, and your queries are the kind of messy, natural questions people actually type.
By the end of this project, you will be able to:
- Load and inspect a real FAQ corpus with pandas.
- Embed an entire collection of questions once, then reuse those vectors for every search.
- Rank the corpus against any user query using cosine similarity and return the top-k matches.
- Wrap the whole pipeline in a clean, reusable
search(query, k=3)function.
Download the dataset before you begin: https://datatweets.com/datasets/support-faqs.csv. Save it next to your script or notebook as support-faqs.csv. It has three columns, id, question, and answer, with 15 rows of common support questions. Let’s build.
Step 1 - Load and Inspect the FAQ Data
Every search engine starts with a corpus: the collection of documents you want to search. Here the corpus is a table of FAQs. Load it with pandas and take a look before you touch any embeddings, so you know exactly what you are working with.
import pandas as pd
faqs = pd.read_csv("support-faqs.csv")
print(faqs.shape)
print(faqs[["id", "question"]].head())(15, 3)
id question
0 1 How do I reset my password?
1 2 How long does shipping take?
2 3 What is your return policy?
3 4 Do you ship internationally?
4 5 How can I track my order?Fifteen rows and three columns, exactly as expected. The question column holds the text you will search over, and the answer column holds what you will eventually want to return to the user. For now, the questions are the part that matters: those are the documents you will turn into vectors.
A corpus this small is easy to reason about, which is the point. The techniques here scale to thousands or millions of documents without changing, but a 15-row table lets you read every result and confirm with your own eyes that the search is behaving.
Step 2 - Embed All FAQ Questions Once
This is the heart of the system. You convert every question into a vector a single time, up front. That collection of vectors becomes your searchable index. You never have to re-read or re-embed the corpus again during a search; you only embed the incoming query and compare it against this fixed set.
Load the all-MiniLM-L6-v2 model from Lesson 1 and encode the whole question column in one batched call.
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("all-MiniLM-L6-v2")
question_embeddings = model.encode(faqs["question"].tolist())
print(question_embeddings.shape)(15, 384)The shape tells the whole story: 15 rows, one per FAQ, and 384 numbers per row. That 384 is the dimensionality of this model’s embedding space, which is fixed no matter how long or short each question is. Every question, from a five-word one to a long one, ends up as a point in the same 384-dimensional space, which is exactly what lets you compare them.
Note that you passed faqs["question"].tolist() as a single list. The model batches that internally, which is far faster than encoding the questions one at a time in a loop, just as you saw in Lesson 2. This one call is the most expensive step in the entire project, and you only pay for it once.
Step 3 - Embed a Query and Rank by Cosine Similarity
Now the search itself. A user types a question, you embed it the same way you embedded the corpus, and you measure how close that query vector is to each of the 15 question vectors using cosine similarity. The closest ones are the best matches.
First, a small helper for cosine similarity between one query vector and the whole matrix of question vectors. Normalizing each vector to unit length turns the dot product into the cosine of the angle between them, which is exactly the similarity score from Lesson 3.
import numpy as np
def cosine_similarity(query_vec, matrix):
query_norm = query_vec / np.linalg.norm(query_vec)
matrix_norm = matrix / np.linalg.norm(matrix, axis=1, keepdims=True)
return matrix_norm @ query_normThe @ operator does a matrix-vector multiply, computing all 15 similarity scores in a single operation rather than looping. Now embed a query, score it against the corpus, and pull out the three highest.
query = "What is the deadline for returning a purchase?"
query_embedding = model.encode(query)
scores = cosine_similarity(query_embedding, question_embeddings)
top_indices = np.argsort(scores)[::-1][:3]
for i in top_indices:
print(round(float(scores[i]), 3), "|", faqs.iloc[i]["question"])0.596 | What is your return policy?
0.452 | How long does shipping take?
0.381 | How do I contact customer support?Look closely at what just happened. The user asked about “the deadline for returning a purchase.” The top match is “What is your return policy?” with a score of 0.596, even though the query and the FAQ share almost no words. There is no “deadline,” no “returning,” and no “purchase” in the matched question. A keyword search would have missed it completely. The embedding model understood that returning a purchase and a return policy are about the same thing, and that is the entire reason semantic search exists.
np.argsort sorts the scores from lowest to highest, so reversing with [::-1] puts the strongest matches first, and [:3] keeps the top three. The second and third results are much weaker, which is the model honestly telling you nothing else in the corpus is really about returns.
Step 4 - Wrap It in a Reusable search Function
You now have every piece. The last step is to package them so you can search with one line and get back not just the matched question but the answer the user actually needs. Define a search function that takes a query and a k, embeds the query, ranks the corpus, and returns the top-k matches as clean dictionaries.
def search(query, k=3):
query_embedding = model.encode(query)
scores = cosine_similarity(query_embedding, question_embeddings)
top_indices = np.argsort(scores)[::-1][:k]
results = []
for i in top_indices:
results.append({
"score": round(float(scores[i]), 3),
"question": faqs.iloc[i]["question"],
"answer": faqs.iloc[i]["answer"],
})
return resultsNotice that the function relies on model and question_embeddings already existing. The corpus embeddings are computed once outside the function and reused on every call, which keeps each search fast. Now try it on a completely different query, one phrased nothing like any FAQ in the corpus.
for match in search("I can't sign into my account", k=3):
print(match["score"], "|", match["question"])
print(" ->", match["answer"])0.549 | How do I reset my password?
-> Go to the sign-in page and click "Forgot password." We'll email you a secure link to choose a new one.
0.34 | How do I update my email address?
-> Open Account Settings, edit the email field, and confirm the change from the verification email we send.
0.328 | How do I contact customer support?
-> Email us at [email protected] or use the live chat button at the bottom-right of any page.The user said “I can’t sign into my account.” The corpus has no FAQ about signing in, no FAQ about accounts, and no FAQ that repeats those words. Yet the top match, at 0.549, is “How do I reset my password?”, which is precisely the answer a stuck user needs. The model connected “can’t sign in” to “reset my password” through meaning alone. The gap between the top score and the rest also shows the engine is confident: the first result is well ahead of the others.
You have built a working semantic search engine in about forty lines of code.
Embed the corpus once, not once per query
The single most important habit in this project is that question_embeddings is computed one time, before any searching happens, and then reused on every call to search. Encoding text is the slow part; comparing vectors is nearly instant. If you re-embedded all 15 FAQs inside search, every query would redo that expensive work for no reason. Right now those vectors live in a NumPy array in memory, which is perfect for a small corpus. When you have millions of documents, you store and index them in a purpose-built vector database so you can search them in milliseconds without holding everything in memory or scanning every row. That is exactly where Module 6 goes next.
Extend the Project
The engine works, but a production system needs a few more touches. Try these to deepen your understanding.
Exercise 1 - Search over the answers too
Right now you only embed the question column. Sometimes a user’s wording matches the wording of an answer better than the question. Build a second set of embeddings from the answer column and let search rank against both, or against a combined “question + answer” string, and compare which gives better matches.
Hint
You can concatenate the columns before encoding: (faqs["question"] + " " + faqs["answer"]).tolist(). Embed that combined list once, store it alongside or instead of question_embeddings, and keep the ranking logic the same. The only thing that changes is what text each vector represents.
Exercise 2 - Add a “no good match” threshold
If a user asks something your FAQs simply do not cover, the engine still returns its best guess, which can be misleading. Add a similarity threshold so that when even the top score is below some cutoff, search returns an empty list or a message like “No matching FAQ found.”
Hint
Inside search, check the score of the best result before building the output. Something like if scores[top_indices[0]] < threshold: return []. Run a few off-topic queries such as “Do you sell pet food?” and watch the top scores to pick a sensible cutoff for this corpus.
Exercise 3 - Return only the single best answer
For a chatbot or auto-reply, you often want one clean answer string rather than a ranked list of dictionaries. Add a thin wrapper, for example best_answer(query), that calls search(query, k=1) and returns just the answer field of the top result, respecting the threshold from Exercise 2.
Hint
Reuse search rather than rewriting the ranking. Call results = search(query, k=1), and if results is non-empty return results[0]["answer"], otherwise return a fallback string. Small composable functions are easier to test than one large one.
Summary
You built a complete semantic search engine from the ground up. You loaded a real FAQ corpus with pandas, embedded all 15 questions once with all-MiniLM-L6-v2, and used cosine similarity to rank those questions against any incoming query. You saw the engine match “the deadline for returning a purchase” to the return policy and “I can’t sign into my account” to the password-reset FAQ, neither of which shares meaningful words with its match. Finally you packaged everything into a reusable search(query, k=3) function that returns the best answers, not just the closest questions.
Key Concepts
- Corpus embeddings computed once. The full set of document vectors is built a single time and reused for every query; only the query is embedded per search.
- Cosine similarity as the ranking signal. Normalizing vectors and taking their dot product gives a meaning-based score where higher is more similar.
- Top-k retrieval. Sorting the similarity scores and slicing the top results turns a pile of numbers into a short, useful answer list.
- Matching by meaning, not words. The engine succeeds precisely on queries that share no vocabulary with their answers, which is what separates semantic search from keyword search.
- A reusable search interface. Wrapping the pipeline in one function with a
kparameter makes the engine easy to call, test, and extend.
Why This Matters
Semantic search is the retrieval layer underneath most modern AI products. It is how a help center surfaces the right article, how a chatbot grounds its answers in your documents, and how retrieval-augmented generation feeds relevant context to a language model before it writes a response. The exact pattern you built here, embed a corpus once and rank it against a query by cosine similarity, is the same pattern those systems use. The only thing that changes at scale is where the vectors live and how fast you can search them.
Next Steps
Continue to Module 6 - Vector Databases
Your search works on 15 FAQs in memory. Learn how vector databases store and index embeddings so the same search scales to millions of vectors in milliseconds.
Back to Module Overview
Return to the Embeddings & Semantic Search module to review embeddings, batching, cosine similarity, and this project.
Continue Building Your Skills
You now have a working semantic search engine and, more importantly, a clear mental model of how retrieval works under the hood. Keep the function open and feed it your own questions; the fastest way to build intuition for embeddings is to watch them get the answer right when the words are all wrong. When you are ready to make this fast at scale, Module 6 is waiting.