Lesson 3 - Lists and For Loops
On this page
- Understanding Lists
- Accessing Elements in a List (Indexing)
- Negative Indexing
- Retrieving Multiple Elements (List Slicing)
- Storing Multiple Rows of Data (Lists of Lists)
- Opening and Reading a Data File
- From a Single String to Separate Rows
- Splitting Rows into Columns
- Converting All Rows into a List of Lists
- Cleaning and Converting Data Types
- Applying Transformations to the Entire Dataset
- Calculating the Average Rating
- Next Steps
In this lesson, you’ll learn how to work with data in Python using lists and loops. We’ll practice basic techniques to store information in lists, access data by indexing, slice lists to get specific segments, and use for loops to process large amounts of data quickly. We’ll also explore how to read a data file (in this case, PageGarden.csv
) and transform it into a manageable structure for analysis.
What is PageGarden.csv
?
PageGarden.csv
(Download Here) is a fictional dataset representing an online bookstore called “PageGarden.” Each row in this dataset describes a single book. For example, each row might include:
- The book title (text)
- The price (a number, e.g., 0.0 if it’s free)
- The currency (e.g., “USD”)
- The total number of customer reviews (a large integer)
- The average user rating (a floating-point number)
A few sample rows from PageGarden.csv
might look like this (note: this is sample data, not real):
- The first row is called the “header” row. It describes what each column represents.
- Each subsequent row describes one book. For example,
"Whispering Leaves"
has:price = 12.99
currency = USD
total_reviews = 1050300
avg_rating = 4.2
Over the course of these lessons, we’ll learn to read this file, store it in Python as a list of lists, and then perform simple data analysis tasks (like finding the average rating of all books).
Understanding Lists
When working with data, it’s essential to store and organize information in a way that’s easy to manage. One of the most basic and useful data structures in Python is the list.
A list is a collection of items placed in a specific order and enclosed in square brackets [ ]
. These items are often called “elements” of the list. Lists are very flexible because they can store elements of many different data types—such as text (strings), numbers (integers and floats), and even other lists.
Why use lists for this data?
If we think about one book from our PageGarden.csv
dataset, it has several pieces of information:
- The title (e.g.,
"Whispering Leaves"
) - The price (e.g.,
12.99
) - The currency (e.g.,
"USD"
) - The total number of reviews (e.g.,
1050300
) - The average rating (e.g.,
4.2
)
We could create a separate variable for each piece of information, like this:
But imagine we had thousands of books! Creating so many variables would be messy and hard to manage.
Instead, we can use a list to store all these pieces of information about one book together:
Now, book_1
holds all the information for this single book in one place. We don’t have to keep track of multiple variables—just one list. This makes our code shorter, easier to read, and more scalable (meaning it’s easier to expand if we have more books).
Check the type:
print(book_1)
might show something like["Whispering Leaves", 12.99, "USD", 1050300, 4.2]
.print(type(book_1))
will confirm thatbook_1
is alist
.
How to create a list:
- Write the elements separated by commas.
- Enclose them in square brackets.
For example:
Here:
- “Hello” (a string),
- 10 (an integer), and
- 3.5 (a float)
are all stored together in one list named sample_list
.
Exercise:
Create a list named
book_2
that represents another book from our PageGarden data. Use the following details:- Title:
"Berry Tales"
- Price:
0.0
- Currency:
"USD"
- Total reviews:
985500
- Average rating:
4.5
In code:
- Title:
Create another list named
book_3
for:- Title:
"Moon Over Pine"
- Price:
7.50
- Currency:
"USD"
- Total reviews:
899000
- Average rating:
4.6
- Title:
These two exercises will help you get comfortable with creating lists. In the next section, we’ll learn how to access elements inside these lists.
Accessing Elements in a List (Indexing)
Now that we know how to create a list, the next step is understanding how to access the information stored inside it. Since lists often contain multiple elements, we need a systematic way to identify and retrieve each piece of data. In Python, this system is called indexing.
What is an index? An index is a numeric position assigned to each element in the list. Indexing in Python starts at 0 for the first element, 1 for the second, and so on. This zero-based indexing is a common feature in many programming languages.
Consider our book_1
list from the previous section:
Let’s write down the index numbers for each element:
- Index 0 →
"Whispering Leaves"
- Index 1 →
12.99
- Index 2 →
"USD"
- Index 3 →
1050300
- Index 4 →
4.2
If we want just the book’s title, "Whispering Leaves"
, we can access it by using its index:
To access the average rating (which is at index 4):
Why does indexing matter?
Indexing lets you pick out exactly the piece of data you need from a list. This is especially useful when you’re dealing with many items. Imagine that PageGarden.csv
has thousands of rows—indexing allows you to write generic code to handle each row without having to manually name variables for each piece of data.
Indexing for Large Datasets
Soon, we’ll load the entire PageGarden.csv
file as a list of lists. Each sublist will represent one book’s data. With indexing, we can easily select which part of the data we need—like extracting all the ratings to find their average.
Important details about indexing:
- The first element is always at index 0.
- If you try to access an index that doesn’t exist (for example,
book_1[10]
), Python will give you anIndexError
because the index is out of range. - You can use indexing with variables. For instance, if
book_2
is["Berry Tales", 0.0, "USD", 985500, 4.5]
, thenbook_2[3]
will give you985500
(the total number of reviews).
Exercise:
Given the
book_2
list:- Extract the total number of reviews and store it in a variable named
berry_reviews
. - Print
berry_reviews
to confirm it’s985500
.
- Extract the total number of reviews and store it in a variable named
Do the same for
book_3
:- Extract the average rating and store it in
moon_rating
. - Print
moon_rating
to confirm it’s4.6
.
- Extract the average rating and store it in
By practicing indexing, you’ll be better prepared to handle large datasets, where indexing is essential for selecting and using the right parts of your data. In the next section, we’ll explore a related concept called negative indexing, which allows you to access elements from the end of a list.
Negative Indexing
We’ve seen that we can use positive numbers (0, 1, 2, …) to access elements in a list, starting from the front. In Python, there’s another helpful feature called negative indexing, which allows you to access elements from the end of the list backwards.
How does negative indexing work?
-1
refers to the last element in the list.-2
refers to the second-to-last element.-3
refers to the third-to-last element, and so forth.
This might seem unusual at first, but it’s a very convenient shortcut when you often need to access the last few elements of a list without knowing its exact length.
Let’s revisit book_1
:
Using positive indexing, book_1[4]
gives us the average rating (4.2
). Using negative indexing:
book_1[-1]
also returns 4.2
, the last element of the list. It’s just another way to reach the same data.
Similarly, book_1[-2]
would give us the second-to-last element:
Why use negative indexing?
- If you’re always interested in the last element (like the most recent rating or the newest piece of data), negative indexing saves you from having to calculate the length of the list or remember which index number the last element has.
- For instance, when reading
PageGarden.csv
, if you want the last piece of data from each row (the average rating), you don’t need to know that it’s at index 4. You can just use[-1]
to get it.
Be careful with your indexing!
- Negative indexing starts at -1 (not -0, which doesn’t exist), so the last element is always
[-1]
. - If you use an index that’s too far negative (like
book_1[-10]
when the list doesn’t have that many elements), you’ll get anIndexError
just as you would with a positive index out of range.
Exercise:
Using
book_2
(["Berry Tales", 0.0, "USD", 985500, 4.5]
), retrieve the average rating using negative indexing and store it inberry_rating
.Using
book_3
(["Moon Over Pine", 7.50, "USD", 899000, 4.6]
), retrieve the total number of reviews using negative indexing. Hint: the total reviews are the second-to-last element.
With these exercises, you’ll become comfortable using negative indexing to quickly access elements at the end of your lists—another handy tool when dealing with data from large datasets. Next, we’ll learn how to access multiple elements at once using a technique called slicing.
Retrieving Multiple Elements (List Slicing)
So far, we’ve focused on accessing single elements from a list using positive and negative indexing. But what if we need to retrieve more than one element at a time? For example, we might want to separate the title, price, and currency of a book from its review counts and ratings. Instead of accessing each element individually, Python lets us use slicing to extract a range of elements at once.
What is slicing? Slicing allows you to specify a start and an end index, and Python will give you a new list containing all the elements between those two indices. The general syntax for slicing is:
start
is the index where the slice begins (inclusive).end
is the index where the slice ends (exclusive, meaning it does not include the element at this index).
For instance, consider our book_1
list again:
The indices for book_1
are:
- 0 → “Whispering Leaves”
- 1 → 12.99
- 2 → “USD”
- 3 → 1050300
- 4 → 4.2
If we want the first three elements (title, price, and currency) in a separate list, we can do:
This will print:
We used [0:3]
to get elements at indices 0, 1, and 2. Notice that the element at index 3 is not included because slicing stops just before the end index.
Slicing shortcuts:
- If we omit the start index, Python starts from the beginning of the list. For example,
book_1[:3]
is the same asbook_1[0:3]
. - If we omit the end index, Python goes until the end of the list. For example,
book_1[2:]
would give us all elements starting from index 2 through the end of the list. - If we use
book_1[:]
, we get a copy of the entire list.
Why slicing is useful:
When we start working with the full PageGarden.csv
as a list of lists, slicing becomes a powerful way to grab certain segments of our data. For instance, if each row has many columns and we only need the first few for a specific analysis, slicing makes this easy.
Another example:
If book_2 = ["Berry Tales", 0.0, "USD", 985500, 4.5]
and we want just the price and currency, we know these are at indices 1 and 2:
Exercise:
Using
book_1
, create a slice namedmetadata_1
that contains only the title, price, and currency.Using
book_3 = ["Moon Over Pine", 7.50, "USD", 899000, 4.6]
, create a slice namedrating_info
that contains only the total number of reviews and the rating.
By practicing slicing, you’ll be able to quickly and easily extract the exact pieces of data you need. In the next section, we’ll discover how to store many rows of data (like multiple books) in a single list, making it even simpler to manage large datasets.
Storing Multiple Rows of Data (Lists of Lists)
So far, we’ve treated each book’s data as a single list with five elements: title, price, currency, total reviews, and average rating. But what if we have many books? Manually creating a separate variable for each book (like book_1
, book_2
, book_3
, and so on) can quickly become overwhelming.
Instead, we can store all these individual lists inside another list, creating what is commonly known as a list of lists. This structure makes it easy to handle larger datasets, like the entire PageGarden.csv
file, which might contain thousands of books.
Why a list of lists?
If
book_1
represents the first row,book_2
the second,book_3
the third, and so on, we can combine them into a single list calledlibrary_data
.Each element of
library_data
would be one of these smaller lists. For example:Now
library_data
looks like this:
How do we use indexing here?
library_data[0]
would give you the entirebook_1
list.library_data[1]
would give you the entirebook_2
list, and so on.
If we want to retrieve the title of the second book ("Berry Tales"
), we can do:
Notice how we used two indices here:
library_data[1]
to select the second list (the second book),[0]
to select the first element of that list (the title).
This concept of using multiple indices in a row is called chained indexing, and it’s extremely useful for working with more complex data structures.
Advantages of lists of lists:
- You can loop through
library_data
to process every book at once. - You can easily select individual pieces of data from any book by combining indexing and slicing.
- As we’ll see later, this structure is perfect for reading in data from files, like
PageGarden.csv
, directly into Python.
Exercise:
Create a
library_data
list of lists from the three book lists you’ve already defined:Print
library_data
to see how it looks.Retrieve the average rating of the third book using chained indexing. For
book_3 = ["Moon Over Pine", 7.50, "USD", 899000, 4.6]
, the rating is at index 4.
By organizing data this way, handling more rows becomes much simpler. In the next section, we’ll start learning how to read data directly from a file and transform it into a list of lists, just like we did by hand.
Opening and Reading a Data File
Up to this point, we’ve been manually creating lists for each book. In a real-world scenario, you’ll often receive data in the form of a file rather than typing it out yourself. Python makes it possible to open and read files so that you can transform their contents into lists and then analyze the data.
The PageGarden.csv
file
Recall that PageGarden.csv
is a file (a text document) containing many rows of data about books. Each row is structured as follows:
For example:
The first row is the header row, which describes what each column represents. Every subsequent row describes one book. When we read the file into Python, our goal is to end up with a list of lists—one sublist per book—just like library_data
but on a much larger scale.
How to open a file in Python:
You can use the built-in open()
function to open a file. For example:
open("PageGarden.csv")
tries to find a file named “PageGarden.csv” in the same directory (folder) where your Python code is running.- The result is something called a “file object,” which we store in
opened_file
.
Reading the file’s contents:
After opening the file, you can read its entire contents into a single string using the read()
method:
file_contents
will now contain the entire text ofPageGarden.csv
in one long string. This includes the header row and all the data rows, separated by newline characters (\n
).
What does file_contents
look like?
If you print file_contents[:300]
, you’ll see the first 300 characters. This helps you verify that you’re reading the file correctly. It might look like this:
Closing the file: When you finish reading the file, it’s good practice to close it:
This frees up resources on your computer. Even though Python often does this automatically when your program ends, it’s a good habit to close files after you’re done with them.
Why read into a string first? Right now, we have all the data in one long string. That’s not very convenient for analysis. In the next sections, we’ll learn how to break this string apart—first into separate rows, and then into separate columns—so that we can build a list of lists. Once we have a list of lists, we can use all the indexing and slicing techniques we’ve practiced to analyze the dataset.
Exercise:
- Open the
PageGarden.csv
file and store the file object in a variable namedopened_file
. - Read the file’s contents into a variable named
file_contents
. - Print out the first 200 characters of
file_contents
to check that it worked. - Close the file.
Example:
If everything’s working correctly, you’ll see the header row followed by some of the data. In the next section, we’ll learn how to split the string into rows and eventually convert those rows into lists.
From a Single String to Separate Rows
We now have the entire contents of PageGarden.csv
stored in a single string. While this is a good start, having all the data in one long string isn’t very convenient for analysis. Our next goal is to break this large string into smaller, more manageable pieces.
Remember the structure of the CSV file:
Each line represents a different row of data. The first line is the header row, and each subsequent line describes one book.
Splitting by newline characters:
Inside the string, each new line of the file is represented by a special character called a newline (\n
). If we “split” the string whenever we encounter a \n
, we get a list where each element is one line of the file.
For example:
- After this step,
rows
is a list where:rows[0]
is the header:"title,price,currency,total_reviews,avg_rating"
rows[1]
is the first book’s data:"Whispering Leaves",12.99,USD,1050300,4.2"
rows[2]
is the second book’s data:"Berry Tales",0.0,USD,985500,4.5"
- and so forth.
Why is this helpful? Now that we have each row as a separate string, it’s easier to work with. We can:
- Inspect individual rows to understand the data better.
- Further split each row by commas to separate the columns.
- Remove the header row if we only need to analyze the data.
Check the result: You can print the first few rows to see what they look like:
You should see something like:
Dealing with extra newline characters:
Sometimes, CSV files may have blank lines at the end or in between. If that happens, you might get empty strings in your rows
list. You can filter them out using techniques we’ll learn later, or just be aware that they can appear and handle them gracefully.
Next steps: Now that we’ve split the data by lines, our next goal is to split each line by commas so that each line becomes a list of individual values. Eventually, we’ll transform this into a list of lists—one list per book—making it easy to perform calculations like average ratings across all books.
Exercise:
- Assuming you have
file_contents
from the previous step, split it intorows
by using: - Print the first three rows (
rows[:3]
) to inspect the data. - Look at
rows[0]
and identify the headers. - Look at
rows[1]
and identify which parts represent the title, price, currency, total reviews, and average rating.
This process transforms an unstructured string into a list of rows—bringing us one step closer to a clean, easily analyzable data structure. Next, we’ll tackle splitting each row into individual columns.
Splitting Rows into Columns
We’ve successfully split our large string into individual rows, each of which is still just one long piece of text. Remember, each row of the CSV file follows the same pattern:
Within each row, the values are separated by commas. If we can split each row by the comma character (","
), we can isolate the individual pieces of data (title, price, currency, total reviews, and avg_rating) and store them in a list. Ultimately, this will give us a list of lists, where each inner list corresponds to one book.
How to split by commas:
If rows
is a list of strings, where each element is one row of the file, we can select a specific row and split it again:
After splitting by the comma, columns
might look like this:
Notice that the title is surrounded by quotation marks and appears as '"Whispering Leaves"'
. The other values look cleaner. We’ll later learn how to remove these quotes or handle them. For now, the important part is that we have successfully broken the row into separate pieces.
Doing this for all rows:
Eventually, we’ll loop over each element in rows
(except the header) and split by commas to turn every row into a list of values. This will give us a structure like:
Header vs. Data Rows:
- The first row (
rows[0]
) is the header row. It shows column names rather than data about a specific book. - From
rows[1]
onwards, we have data rows representing individual books.
When we start analyzing the data, we may not need the header row. Often, data analysts store the header row separately and then focus on the data rows when performing calculations.
Next steps: Once we have each row split into columns, we can:
- Convert numeric values (like total_reviews and avg_rating) from strings into integers or floats.
- Clean up any extra characters (like quotation marks).
- Perform calculations, such as finding the average rating of all books.
Exercise:
- Select a data row, for example
row = rows[1]
. - Split
row
by commas: - Identify which elements correspond to the title, price, currency, total reviews, and rating.
- Print out the
columns
list to confirm that the data is now separated into individual strings.
By splitting each row into columns, we now have the building blocks we need to create a full list of lists structure—one step closer to straightforward data analysis. In the next lesson, we’ll transform all rows, not just one, into lists of columns and combine them into a single data structure.
Converting All Rows into a List of Lists
We now know how to transform one row of CSV data into a list of individual values by splitting on commas. The next step is to apply this process to every row in rows
to create a list of lists—a structure where each element is a small list representing one book’s data.
What we have so far:
rows
: a list of strings, where each string is one line fromPageGarden.csv
.- The first element of
rows
(rows[0]
) is the header:"title,price,currency,total_reviews,avg_rating"
. - Each subsequent element (like
rows[1]
,rows[2]
, etc.) is a data row, something like:"\"Whispering Leaves\",12.99,USD,1050300,4.2"
Our goal:
- Turn each row into a list of values by splitting it on
","
. - Collect all these lists into a single variable, such as
apps_data
orlibrary_data
. After we do this, we’ll end up with something like:
This structure is much easier to work with because now we can loop over it, access individual elements with indexing, and convert numbers from strings to floats or integers as needed.
How to do this conversion:
Initialize an empty list to store all the rows:
Loop over each element in
rows
. For eachrow
inrows
:- Split the row by commas to get a list of values.
- Append that list of values to
data_as_lists
.
For example:
After the loop finishes,
data_as_lists[0]
should be the header row as a list, anddata_as_lists[1]
should be the first book’s data as a list, etc.
Inspecting the result:
- Print
len(data_as_lists)
to see how many rows (including the header) you have. - Print
data_as_lists[0]
to see the header row as a list. - Print
data_as_lists[1]
to see the first book’s data as a list.
Cleaning and conversion:
Notice that some of the values may still have quotation marks (like '"Whispering Leaves"'
) and all numeric values (like '1050300'
and '4.2'
) are still strings. In future steps, we’ll handle this by removing unwanted characters and converting numbers from strings to the appropriate numeric types. For now, focus on verifying that you have a list of lists structure.
Exercise:
- Create an empty list called
data_as_lists
. - Loop through all the rows in
rows
and split each row into columns. - Append each list of columns to
data_as_lists
. - Print the first five elements of
data_as_lists
to confirm that the transformation worked correctly.
By completing this task, you are turning a raw CSV file into a structured Python data variable that’s ready for analysis. In the next sections, we’ll learn how to work with this data more effectively, including cleaning up unwanted characters, converting types, and eventually calculating interesting metrics like the average rating of all books.
Cleaning and Converting Data Types
Now that we have data_as_lists
, a list of lists where each inner list represents one row from PageGarden.csv
, we need to clean and prepare this data for analysis. Two common tasks at this stage are:
- Removing unwanted characters, like extra quotation marks in titles.
- Converting numeric values (like total reviews and average rating) from strings to integers or floats, so we can perform calculations.
Why do we need to do this?
- If we leave the quotation marks in titles, it might be harder to display the data nicely or match it against other information.
- If we keep numbers as strings, we can’t easily add them up, find their averages, or perform other mathematical operations.
Example: Titles with quotes
If the title appears as '"Whispering Leaves"'
, we may want to remove the extra quotes. There are several ways to clean strings in Python, one simple approach is to use the strip()
method to remove leading and trailing characters:
This removes any "
characters from the start and end of the string, leaving you with a cleaner title.
Converting numeric values
Right now, values like '1050300'
or '4.2'
are stored as strings. To work with them as numbers:
- Use
int()
for whole numbers (like total reviews). - Use
float()
for decimal numbers (like average rating).
For example:
Applying this to the dataset:
- The header row is probably fine as strings since it’s just column names.
- For each data row, we might do something like:
After these conversions, the data is in a much better format for analysis. You can now easily sum up ratings, calculate averages, and sort the data based on numeric values.
Exercise:
- Select a row from
data_as_lists
(other than the header). For example: - Clean the title by removing extra quotes.
- Convert the price and avg_rating to floats.
- Convert total_reviews to an integer.
- Print the cleaned and converted row to confirm the changes.
By cleaning and converting the data, you’ve made it ready for analysis. In the next sections, you’ll learn how to loop over the entire dataset to apply these transformations to every row, and then start performing calculations like finding the average rating across all books.
Applying Transformations to the Entire Dataset
Now that you understand how to clean and convert data for a single row, it’s time to apply these steps to every row in the dataset. This is where the power of loops really shines. Instead of manually cleaning each row, you can write a loop to process all the rows automatically.
What needs to be done for each row?
- Skip the header row, since it doesn’t contain numerical data.
- Remove any unwanted quotation marks from the title.
- Convert the price and average rating from strings to floats.
- Convert the total number of reviews from a string to an integer.
After this process, every data row in data_as_lists
will be in a uniform, clean format, which makes analysis much simpler.
Example:
After running this loop, every row in data_as_lists
(except the header) will be cleaned and converted into the correct data types. Now data_as_lists
might look like this:
Why do this for the entire dataset? Having your entire dataset properly formatted means you can now:
- Calculate statistics, like the average rating for all books.
- Sort the data by total reviews or rating.
- Filter the data to find books above or below certain thresholds.
All of these tasks require your data to be in a consistent and numeric-friendly format, which is exactly what this step achieves.
Exercise:
- Write a loop that starts from
i = 1
and goes through all the rows ofdata_as_lists
. - For each row:
- Clean the title by stripping quotes.
- Convert the price and avg_rating to floats.
- Convert total_reviews to an int.
- After the loop, print a few rows from
data_as_lists
to ensure the changes took effect.
With this transformation step completed, your dataset is now truly ready for data analysis. In the next sections, we’ll learn how to utilize these clean values to derive meaningful insights, like computing averages and other statistics from the data.
Calculating the Average Rating
Now that your data is clean and each book’s rating is stored as a float, you can start performing calculations. One of the simplest and most common tasks is to find the average rating of all the books in the dataset. This is a straightforward operation now that ratings are numeric values (floats).
How to calculate an average:
- Sum all the ratings.
- Divide the sum by the number of ratings.
If you want the average rating of all books:
- Make sure to skip the header row, since it doesn’t contain ratings.
- Loop over each data row.
- Extract the rating from the appropriate index (for our dataset, this should be at index 4).
- Add it to a running total.
- After processing all rows, divide the total by the number of rows (excluding the header).
Example:
What next?
With the average rating, you have a measure of how the books in the PageGarden.csv
dataset generally perform according to user reviews. You can apply similar techniques to find:
- The average price of books.
- The minimum or maximum number of reviews.
- The most common currency (if the dataset had multiple currencies).
Exercise:
- Using your cleaned
data_as_lists
, calculate the average rating of all books. - Print out the result.
By doing this, you’ve taken raw data from a file, turned it into a manageable structure, cleaned and converted it, and finally performed a calculation that yields a meaningful insight about the dataset. In the next section, we’ll wrap up what we’ve learned and look ahead to more advanced techniques.
Next Steps
In this lesson, you learned:
- How to store related data points in lists.
- How to access elements and slices within a list.
- How to use for loops for repetitive tasks.
- How to read file data and transform it into lists of lists.
- How to compute averages from large datasets.
These skills lay the groundwork for more sophisticated data analysis techniques. As you move forward, you’ll discover new data structures, learn about filtering and sorting data, and eventually explore how these fundamentals link to more advanced analytical tasks, including working alongside AI-driven tools.
By mastering lists and loops now, you’re building a strong foundation for all your future data analysis endeavors.