Lesson 1 - Cloud Data Warehousing with Snowflake
Welcome to Snowflake
So far in this course, every database you have worked with lived on a single machine — SQLite in a file, PostgreSQL on a server. In this lesson you meet a different kind of database: a cloud data warehouse that handles the servers, scaling, and storage for you so you can focus entirely on your data.
Snowflake is one of the most in-demand tools for modern data engineers. Companies like Capital One, HelloFresh, and Nissan use it to power their analytics, and it shows up constantly in job descriptions. By the end of this lesson you will understand what makes it special and have run your own queries against data you loaded yourself.
By the end of this lesson, you will be able to:
- Explain what Snowflake is and why it separates storage from compute
- Create and configure a virtual warehouse for query processing
- Load a CSV file into a Snowflake table
- Write and run SQL queries in a Snowflake worksheet
- Monitor credit usage and keep your compute costs low
You only need the SQL you already know. Let’s begin.
Data for this lesson
Platform: Snowflake (cloud data warehouse).
Data: the bookstore sales dataset — download bookstore.csv and load it into a BOOKS table via Data > Add Data.
What Is Snowflake?
Snowflake is a cloud-based data warehouse platform and a scalable SQL interface. It handles all the infrastructure for you — scaling up or down when needed, storing your data, and processing your queries. You do not configure servers or tune performance. You load your data and start querying.
The problem Snowflake solves
Analytics teams hit the same walls over and over: queries that take too long, data scattered across departments, and infrastructure that is hard to scale. With Snowflake, teams combine data from sales, finance, marketing, and support, then analyze it all in one place. They can share live datasets across departments — or with external partners — without duplicating or exporting anything. And because Snowflake scales compute automatically, it handles everything from a quick dashboard query to heavy end-of-quarter reporting.
Analysts explore and visualize Snowflake data in tools like Power BI or Tableau. Engineers build pipelines that clean and transform raw data. Data scientists run machine learning workflows directly against Snowflake tables without moving data into a separate system. Whether your data is structured tables, semi-structured logs like JSON, or large flat files, Snowflake is a single place to store, query, and share it.
What makes Snowflake different
The single most important idea in Snowflake is this: storage and compute are separated.
In a traditional system like PostgreSQL, your queries run on the same machine that stores your data. In Snowflake, those two jobs are split into two independent layers:
- Storage — where your data lives, in centralized cloud storage
- Compute — the resources that process your queries
This separation has big consequences. Different teams can run queries against the same data at the same time without slowing each other down. You can scale compute up when you need power and down when you do not. And you can pause compute entirely when nobody is querying — so you stop paying for it. Keep this storage-versus-compute split in mind, because almost everything else in Snowflake follows from it.
Getting Started
You can sign up for a free trial of Snowflake at signup.snowflake.com. A few choices come up during signup, and it helps to know what they mean.
You will pick a Snowflake edition. The “Standard” edition is ideal for learning — it gives you the core features at the lowest credit cost. You will also choose a cloud provider (AWS, GCP, or Azure). This does not require a cloud account or any prior knowledge. Snowflake runs on top of these platforms, and the choice simply decides where your data and compute are hosted behind the scenes. For learning, any provider and the default region are fine; you will never interact with the cloud provider directly.
No AWS account needed
Choosing AWS during signup does not mean you create an AWS account, set up an S3 bucket, or install a CLI. Snowflake is a fully managed service — selecting a cloud provider only tells Snowflake where to host your data internally. It manages all of that infrastructure for you.
The free trial includes a set of credits you can spend over a 30-day period, with no payment method required. Those trial credits are time-limited: any you do not use expire after 30 days and do not roll over, so it is worth experimenting freely while you have them.
Once you finish signup and log in, you land in Snowsight, Snowflake’s web-based interface. The navigation panel on the left has tabs for Projects, Data, Monitoring, Admin, and more. You do not need to explore everything at once — this lesson walks you through the pieces you need in order. Let’s start with the component that does the heavy lifting: the virtual warehouse.
Understanding Virtual Warehouses
Your data lives in centralized cloud storage and stays there regardless of how or when you query it. To actually process that data, Snowflake uses a virtual warehouse.
A virtual warehouse is not a physical building and not where your data is stored. It is a temporary engine made of compute resources that powers your SQL queries. Every time you run a query, a warehouse processes it.
Why you can have many warehouses
Because storage is separate, you can create multiple warehouses that all read the same data, each one independent. One team’s warehouse running a heavy report does not block another team’s quick lookup — both run in parallel against the same tables. This is the storage-and-compute separation paying off in practice.
A common misconception is that giving a teammate a different warehouse copies or locks the data. It does not. The data stays in one place; warehouses only supply the compute to read it.
Creating a warehouse
You create a warehouse from the Admin > Warehouses tab:
- Click + Warehouse in the upper-right corner.
- Give it a name like
LEARN_WH. - Choose a size. For practice, the smallest size — X-Small — is plenty.
Snowflake charges compute credits per second while a warehouse is active. The good news is that you can — and should — pause a warehouse when you are not using it to avoid wasting credits. Once created, you can start or stop it at any time.
Auto-suspend
By default, Snowflake will automatically suspend a warehouse after a few minutes of inactivity. You can find and adjust this under Advanced Options when you create or edit a warehouse. Auto-suspend protects you from accidentally leaving a warehouse running — exactly the kind of mistake that quietly burns through trial credits — so keep it enabled while you learn.
Pause early, pause often
If you forget to stop a warehouse, auto-suspend (when enabled) will stop it for you after the idle period. Without auto-suspend, a warehouse keeps running and consuming credits indefinitely. When in doubt, suspend it manually before you walk away.
Loading and Querying Data
Creating a database
Before you can load or query anything, you need a database to hold your work. In Snowflake, a database is a container — similar to a folder — that stores your tables and other objects and keeps them organized.
To create one:
- Go to Data > Databases in the left sidebar.
- Click + Database.
- Name it after your dataset. For this lesson, use
BOOKSTORE. - Click Create.
That is it — you now have a database to work in.
Creating a table and loading a CSV
There are two ways to create a table in Snowflake. You can write a CREATE TABLE statement by hand, or you can upload a CSV file and let Snowflake build the table for you. We will use the CSV upload method because it is the fastest way to get real data in front of you.
First, download the sample dataset of online bookstore sales from this link. Then go to Data > Add Data and choose Load data into a Table.
Before uploading, look at the top-right corner for the Warehouse dropdown. Even though you created LEARN_WH earlier, Snowflake may default to a pre-created warehouse named COMPUTE_WH. Switch the dropdown to LEARN_WH so you use the compute you set up yourself. It is not strictly required, but it is a good habit.
Mind the warehouse dropdown
The warehouse selection controls compute, not storage. If you leave it on COMPUTE_WH, your data still loads into your BOOKSTORE database just fine — but the compute cost is billed to COMPUTE_WH instead of the warehouse you intended. Snowflake will not switch warehouses automatically based on the database, so always check the dropdown.
Now load the file:
- Drag and drop
bookstore.csvinto the Load Data into Table pane. - Select
BOOKSTOREas the database. The schema should default toPUBLIC. - Name the table
books.
On the next screen you set the file format. Under “View Options,” set Header to First line contains header, then click Load.
Always set the header option
If you forget to mark the first line as a header, Snowflake treats it as data. The load still succeeds, but your column names become generic placeholders like COLUMN1, COLUMN2, and your real header row becomes a row of data. This is a common mistake that causes confusion later, so set the header option every time.
When the load finishes, you may see a pop-up offering to query the data. Skip it for now and click Done. You have just loaded real data into a cloud warehouse.
Working with Worksheets and Queries
With your data loaded, you explore it in a worksheet — Snowflake’s scratchpad for writing, running, and saving SQL.
To create one, go to Projects > Worksheets and click + Worksheet. New worksheets get a generic name; double-click the name at the top to rename it to something like Bookstore Queries so you can find it later.
Setting your context
Before you run anything, check the context selectors in the toolbar. They tell Snowflake where to look and what to use:
- Warehouse — the compute running your queries (e.g.
LEARN_WH) - Database — where your data lives (e.g.
BOOKSTORE) - Schema — a sub-container inside the database (usually
PUBLIC)
If your context is wrong, queries may return an error or no results. A frequent surprise is a “table does not exist” error on a table you just created — almost always the context selectors are pointing at the wrong database or schema, not at the one holding your table. Check them first.
Running queries
Snowflake uses a SQL dialect that is mostly ANSI-compliant, so if you have written SQL in PostgreSQL, MySQL, or SQLite, the core syntax works the same here. To run a query, click anywhere inside it and either click the Run button or use the shortcut: Cmd + Enter on Mac, Ctrl + Enter on Windows or Linux.
Only the currently selected query runs — the one your cursor sits in — even if the worksheet holds several. To run more than one, highlight them all first, then run.
Let’s preview the BOOKS table:
-- preview the books table
SELECT *
FROM BOOKS
LIMIT 10;Now answer a real question — the average price for each rating:
-- average price by rating
SELECT RATING, AVG(PRICE)
FROM BOOKS
GROUP BY RATING
ORDER BY RATING DESC;When you run a query, the results appear in a panel at the bottom of the worksheet. That panel always shows only your most recent query. Run another query and the panel replaces the old results. You will not see the output of an earlier query again unless you re-run it — or look it up in the History tab, which you will meet in a moment.
Try a few more on your own by adding them to the worksheet:
-- books priced over 50
SELECT *
FROM BOOKS
WHERE PRICE > 50;
-- number of books per author
SELECT AUTHOR, COUNT(*) AS book_count
FROM BOOKS
GROUP BY AUTHOR
ORDER BY book_count DESC;Modifying data
You can change data too, not just read it. Snowflake supports the statements you already know — CREATE OR REPLACE TABLE, INSERT INTO, UPDATE, and DELETE. Just remember that modifying data consumes compute credits exactly like querying does, so keep your datasets small while learning.
Viewing Query History
Because the results panel only shows your last query, Snowflake gives you a Query History tab to look back. Find it under Monitoring > Query History in the left sidebar.
This tab lists every query you have run recently across all of your worksheets. For each one you can see:
- The SQL text you ran
- The time it ran and how long it took
- The warehouse that processed it
- The status — successful, failed, and so on
Click any query to see its exact SQL and a preview of its results. This is the fastest way to recover a query after you close a worksheet, or to grab an earlier query and tweak it. If you ran a query an hour ago in one worksheet and have since moved on, its results are still here — you do not need to re-run it.
Two things to know: Query History shows only your own queries, not your teammates’, and it stores a limited window of history. For anything you want to keep long-term, name and save the worksheet.
Understanding Credits and Costs
Snowflake’s pricing is usage-based, and you pay for two things separately:
- Storage — priced per terabyte per month. For learning, this is negligible.
- Compute — based on how long your warehouse is active. A running warehouse consumes credits by the second.
Running queries uses compute, which is what triggers credit consumption. You cannot see the cost of an individual query in the History tab, but you can see overall consumption under Admin > Cost Management, broken down by day and by warehouse.
A few habits keep costs low while you learn:
- Pause your warehouse when it is idle. You only pay for compute while it runs, and an idle-but-running warehouse still burns credits by the second.
- Use a small warehouse size (X-Small) unless you genuinely need more power.
- Avoid rerunning queries unnecessarily. Snowflake charges for each execution. Running the same
SELECTthree times in a row costs you three times, even though the data never changed.
Learning to watch your credit usage now will make you far more efficient on real projects later.
Practice Exercises
Try these in your own Snowflake trial. They build directly on what you loaded above.
Exercise 1: Create and size a warehouse
Create a second virtual warehouse named REPORTING_WH at X-Small size with auto-suspend enabled. Then explain, in one sentence, why running a query on REPORTING_WH would not interfere with a query running on LEARN_WH against the same BOOKSTORE data.
Hint
Warehouses are created under Admin > Warehouses > + Warehouse. The reason they do not interfere comes straight from Snowflake’s core design: storage and compute are separated, so independent warehouses can read the same data in parallel.
Exercise 2: Explore the books table
In your Bookstore Queries worksheet, write queries to answer:
- How many books are in the table?
- What is the most expensive book?
- What is the average rating across all books?
-- Your queries hereHint
Use COUNT(*) for the total, MAX(PRICE) (or ORDER BY PRICE DESC LIMIT 1) for the most expensive, and AVG(RATING) for the average. Make sure your context selectors point at BOOKSTORE / PUBLIC before you run.
Exercise 3: Read your own history
Run three different queries, then open Monitoring > Query History. Find the query that took the longest to run and note which warehouse processed it. Then go to Admin > Cost Management and check how many credits you have used today.
Hint
Query History shows duration and warehouse for each query. Cost Management shows credits by day and warehouse — a good habit to build before you start running larger workloads.
Summary
You went from never having touched a cloud data warehouse to loading real data and querying it in Snowflake. Along the way you saw why separating storage from compute is the idea that makes everything else — parallel teams, on-demand scaling, pay-for-what-you-use pricing — possible.
Key Concepts
- Snowflake — a fully managed cloud data warehouse with a scalable SQL interface; it handles servers, scaling, and storage for you.
- Storage vs. compute — your data lives in centralized storage; warehouses provide the separate compute that processes queries. Separating them lets teams scale and work in parallel.
- Virtual warehouse — a temporary compute engine that powers your queries. You can run many, size them independently, pause them, and enable auto-suspend.
- Database, schema, table — a database is a container of objects; a schema is a sub-container (often
PUBLIC); a table holds your rows. Context selectors tell a worksheet where to look. - Worksheet — Snowflake’s SQL editor; the results panel shows only your most recent query.
- Query History — a record of your past queries, their timing, warehouse, and status.
- Credits — Snowflake bills compute per second of warehouse uptime; pause idle warehouses, use small sizes, and avoid needless reruns to control cost.
Why This Matters
Cloud data warehouses are where modern analytics happens. When a job description says “Snowflake experience preferred,” it means exactly the skills you just practiced: standing up compute, loading data, querying it with familiar SQL, and being responsible about cost. Because Snowflake is mostly ANSI-compliant, the SQL you already know transfers directly — what is new is the operational mindset of separating storage from compute and watching your credits. That mindset carries over to every other cloud warehouse, including BigQuery and Redshift.
Next Steps
You can now load and query data in a cloud warehouse and keep an eye on its cost. Next, you will step outside the relational world entirely and learn about NoSQL — what it is, the four main types, and when it beats SQL.
Continue to Lesson 2 - Introduction to NoSQL Databases
Learn how NoSQL differs from SQL, explore the four main types, and develop the judgment to choose the right one
Back to Module Overview
Return to the Production Database Tools module overview
Continue Building Your Skills
The best way to make Snowflake stick is to keep using it. Load a larger dataset, experiment with a bigger warehouse size and watch the credit difference, or try organizing your work into separate schemas. Every minute you spend in Snowsight builds the kind of practical, portfolio-ready experience that employers look for — and the next time you see “Snowflake” in a job posting, you will know exactly what it means.