Lesson 7 - Beyond the Basics: Python Dictionaries and Frequency Tables
On this page
- Checking for Dictionary Membership
- Updating Dictionary Values
- Counting with Dictionaries
- Finding Unique Values While Counting
- From Counts to Proportions and Percentages
- Why Separate Dictionaries?
- Creating a Proportions Dictionary
- Creating a Percentages Dictionary
- Or Do Both in One Loop?
- Understanding the Need
- Step-by-Step: Total Ratings Frequency Table
In this lesson, we’ll build on what we learned about dictionaries in the previous lesson and use them to create frequency tables — a powerful tool for summarizing data. You’ll learn how to check if a value exists in a dictionary, update values, and use dictionaries to count and analyze data in a real dataset.
We’ll work with a dataset of iOS apps from the Apple App Store, and we’ll focus on a column called cont_rating
, which represents the content rating (or age recommendation) for each app. Here’s a sample of the data from that column:
Content Rating | Number of Apps |
---|---|
4+ | 4433 |
9+ | 987 |
12+ | 1155 |
17+ | 622 |
As you can see, most apps are rated 4+, meaning they’re suitable for ages 4 and up, while fewer apps are rated 17+.
In this lesson, we’ll learn how to count values like this using dictionaries.
Checking for Dictionary Membership
Once we’ve created a dictionary, we can check whether a particular key exists in it using the in
operator. This is a very useful way to validate data before accessing or modifying it.
Let’s take a look at an example:
content_ratings = {'4+': 4433, '9+': 987, '12+': 1155, '17+': 622}
print('12+' in content_ratings)
Output:
True
This returns True
because '12+'
is indeed a key in the content_ratings
dictionary.
Now try checking for something that doesn’t exist:
print('10+' in content_ratings)
Output:
False
Here’s something important to remember: the in
operator checks only the keys, not the values. So even if a value like 4433
exists, this will return False
:
print(4433 in content_ratings) # False
We can also use this inside if
statements:
if '17+' in content_ratings:
print("It exists")
Let’s try this in practice.
🧪 Practice Task
# Given dictionary
content_ratings = {'4+': 4433, '9+': 987, '12+': 1155, '17+': 622}
# Check for dictionary membership
is_in_dictionary_1 = '9+' in content_ratings
is_in_dictionary_2 = 987 in content_ratings
# Use in with an if statement
if '17+' in content_ratings:
result = "It exists"
print(result)
Next, we’ll learn how to update values in a dictionary and use them for counting.
Updating Dictionary Values
Once you’ve created a dictionary, you can easily change any of the values by referencing the key you want to update. This is especially helpful when you’re trying to correct data or track counts.
Here’s how it works.
Let’s say we have a dictionary where some values are incorrect:
content_ratings = {'4+': 622, '12+': '1155', '9+': 987, '17+': 4433}
It looks like the values for '4+'
and '17+'
are swapped, and the value for '12+'
is stored as a string instead of an integer. Let’s fix all that step by step.
Step-by-step Breakdown
Swap two values using a temporary variable:
temp = content_ratings['4+'] content_ratings['4+'] = content_ratings['17+'] content_ratings['17+'] = temp
Convert a string to an integer:
content_ratings['12+'] = int(content_ratings['12+'])
Print the result:
print(content_ratings)
Now the dictionary is correct again:
{'4+': 4433, '12+': 1155, '9+': 987, '17+': 622}
🧪 Practice Task
Let’s do the same together.
# Given dictionary with swapped and incorrect data
content_ratings = {'4+': 622, '12+': '1155', '9+': 987, '17+': 4433}
# Step 1: Swap '4+' and '17+'
temp = content_ratings['4+']
content_ratings['4+'] = content_ratings['17+']
content_ratings['17+'] = temp
# Step 2: Convert '12+' value from string to integer
content_ratings['12+'] = int(content_ratings['12+'])
# Step 3: Print the corrected dictionary
print(content_ratings)
This kind of value updating is very common when you’re processing real-world datasets, where data might be stored in the wrong format or need adjustment.
Up next, we’ll take this one step further — using dictionaries to count things automatically.
Counting with Dictionaries
Imagine you have a list of content ratings from several apps:
ratings = ['4+', '4+', '4+', '9+', '9+', '12+', '17+']
You want to count how many times each rating appears. Sure, you could use .count()
on the list — but with large datasets or more complex conditions, using a dictionary as a frequency table is much more scalable and flexible.
Step-by-step: Manual Frequency Count
Let’s manually create a dictionary with keys for each unique content rating and a value of 0
:
content_ratings = {'4+': 0, '9+': 0, '12+': 0, '17+': 0}
Now we loop through the ratings
list and increment the count for each rating:
ratings = ['4+', '4+', '4+', '9+', '9+', '12+', '17+']
for c_rating in ratings:
if c_rating in content_ratings:
content_ratings[c_rating] += 1
Check the result:
print(content_ratings)
Output:
{'4+': 3, '9+': 2, '12+': 1, '17+': 1}
Great! We now have a dictionary that counts how many times each content rating appears.
Let’s make it even clearer by printing the dictionary after each step of the loop:
for c_rating in ratings:
if c_rating in content_ratings:
content_ratings[c_rating] += 1
print(content_ratings)
This helps you see how the dictionary updates in real time.
🧪 Your Turn: Use It on Real Data
Let’s now do this on real data from the Apple App Store.
We already know our content ratings: 4+
, 9+
, 12+
, and 17+
.
We’ll now read the Apple Store dataset (Download Here) and count how many apps fall into each category:
# Step 1: Initialize dictionary with zero values
content_ratings = {'4+': 0, '9+': 0, '12+': 0, '17+': 0}
# Step 2: Read dataset (assuming AppleStore.csv is already loaded)
opened_file = open('AppleStore.csv')
from csv import reader
read_file = reader(opened_file)
apps_data = list(read_file)
# Step 3: Loop through each row (skip header)
for row in apps_data[1:]:
c_rating = row[10]
if c_rating in content_ratings:
content_ratings[c_rating] += 1
# Step 4: See the frequency table
print(content_ratings)
Expected output:
{'4+': 4433, '9+': 987, '12+': 1155, '17+': 622}
Awesome, we just built a frequency table with Python dictionaries!
Finding Unique Values While Counting
In the previous example, we had prior knowledge of all the unique content ratings (4+
, 9+
, 12+
, 17+
), so we created a dictionary with those as keys. But what if you’re working with a column and you don’t know in advance what values are in it?
Let’s simulate that with our same ratings
list:
ratings = ['4+', '4+', '4+', '9+', '9+', '12+', '17+']
We’ll start with an empty dictionary, and for each content rating we see, we’ll check:
- If the key already exists, increment its count.
- If the key doesn’t exist, add it and start the count at 1.
Here’s how we do that:
content_ratings = {}
for c_rating in ratings:
if c_rating in content_ratings:
content_ratings[c_rating] += 1
else:
content_ratings[c_rating] = 1
Check the output:
print(content_ratings)
Result:
{'4+': 3, '9+': 2, '12+': 1, '17+': 1}
✅ Why 1
, not 0
?
We start at 1
(not 0
) when we add a new key because we’re counting the first occurrence of that value. It’s already appeared once by the time we find it.
Let’s do it on the actual dataset!
This time we’ll apply this logic to the Apple App Store data — but we won’t assume which ratings are in there.
# Step 1: Create an empty dictionary
content_ratings = {}
# Step 2: Read the dataset (assumes AppleStore.csv is already opened)
opened_file = open('AppleStore.csv')
from csv import reader
read_file = reader(opened_file)
apps_data = list(read_file)
# Step 3: Loop through rows (skip header)
for row in apps_data[1:]:
c_rating = row[10]
if c_rating in content_ratings:
content_ratings[c_rating] += 1
else:
content_ratings[c_rating] = 1
# Step 4: Print the result
print(content_ratings)
This will give you the real content rating frequency table — even if there are new or unexpected ratings in the data.
Seeing Changes Over Time
To better understand what’s going on as the loop runs, you can add a print statement inside the loop:
print(content_ratings)
That way, you’ll see how the dictionary grows with each new app row — it’s like watching the frequency table build itself step-by-step.
Now that we’ve built a frequency table using a dictionary, let’s take the next step — converting those raw counts into proportions and percentages.
This helps us answer questions like:
- What fraction of all apps are rated
4+
? - What percentage of apps are suitable for a 15-year-old?
From Counts to Proportions and Percentages
Here’s our frequency table again (we’ll hard-code it here for simplicity):
content_ratings = {'4+': 4433, '9+': 987, '12+': 1155, '17+': 622}
total_number_of_apps = 7197
🔸 Proportion
A proportion is the part relative to the total. For example:
proportion_4_plus = 4433 / 7197 # 0.6159 (around 62%)
But doing that manually for each value is a pain. Instead, let’s use a loop to update every value in the dictionary:
for rating in content_ratings:
content_ratings[rating] /= total_number_of_apps
Now each value is a proportion between 0 and 1.
🔸 Percentage
Want to see the values as percentages? Multiply each proportion by 100:
for rating in content_ratings:
content_ratings[rating] *= 100
Now each dictionary value represents a percentage.
Full example:
content_ratings = {'4+': 4433, '9+': 987, '12+': 1155, '17+': 622}
total_number_of_apps = 7197
for rating in content_ratings:
content_ratings[rating] /= total_number_of_apps
content_ratings[rating] *= 100
print(content_ratings)
You’ll get something like:
{'4+': 61.59, '9+': 13.71, '12+': 16.05, '17+': 8.64}
Let’s Answer Two Real Questions
✅ 1. What percentage of apps are rated 17+
?
percentage_17_plus = content_ratings['17+']
Result: 8.64%
of all apps are 17+.
✅ 2. What percentage of apps can a 15-year-old use?
A 15-year-old can use apps rated 4+
, 9+
, and 12+
. So we add those percentages:
percentage_15_allowed = (
content_ratings['4+'] +
content_ratings['9+'] +
content_ratings['12+']
)
Or:
percentage_15_allowed = 100 - content_ratings['17+']
Either way, it gives you around 91.36%
.
Summary
- Proportion: divide each frequency by total.
- Percentage: multiply each proportion by 100.
- Use dictionary keys to update values in a loop.
- Now we can turn raw data into insights.
Let’s now take a look at how we can keep the original frequency data intact while also computing proportions and percentages using separate dictionaries.
This is super useful if you want to refer back to the original counts later (which is very common in data analysis).
Why Separate Dictionaries?
Previously, we updated the content_ratings
dictionary in-place:
content_ratings['4+'] /= total_number_of_apps
This overwrites the original value (e.g., 4433 → 61.59).
But what if you need both the original frequency and the calculated proportion?
➡️ Solution: Create two new dictionaries:
- One for proportions
- One for percentages
Creating a Proportions Dictionary
Start with the original:
content_ratings = {'4+': 4433, '12+': 1155, '9+': 987, '17+': 622}
total_number_of_apps = 7197
Now let’s create a new dictionary to store proportions:
c_ratings_proportions = {}
for key in content_ratings:
proportion = content_ratings[key] / total_number_of_apps
c_ratings_proportions[key] = proportion
Now c_ratings_proportions
contains:
{
'4+': 0.6159,
'12+': 0.1605,
'9+': 0.1371,
'17+': 0.0864
}
Creating a Percentages Dictionary
Same idea, but multiply each proportion by 100:
c_ratings_percentages = {}
for key in c_ratings_proportions:
percentage = c_ratings_proportions[key] * 100
c_ratings_percentages[key] = percentage
Now you have:
{
'4+': 61.59,
'12+': 16.05,
'9+': 13.71,
'17+': 8.64
}
Or Do Both in One Loop?
You can also create both dictionaries at once:
c_ratings_proportions = {}
c_ratings_percentages = {}
for key in content_ratings:
proportion = content_ratings[key] / total_number_of_apps
percentage = proportion * 100
c_ratings_proportions[key] = proportion
c_ratings_percentages[key] = percentage
This avoids looping through twice — more efficient!
Final Code Summary
content_ratings = {'4+': 4433, '12+': 1155, '9+': 987, '17+': 622}
total_number_of_apps = 7197
c_ratings_proportions = {}
c_ratings_percentages = {}
for key in content_ratings:
proportion = content_ratings[key] / total_number_of_apps
percentage = proportion * 100
c_ratings_proportions[key] = proportion
c_ratings_percentages[key] = percentage
print(c_ratings_proportions)
print(c_ratings_percentages)
Perfect! Let’s now build a frequency table for the rating_count_tot
column — this column tells us how many total user ratings each app has received.
This time, we’ll:
- Extract the data
- Find the min and max values
- Choose intervals
- Count how many ratings fall into each interval
All in one clean, readable lesson — beginner-friendly, just like before.
Understanding the Need
Just like with app sizes, the rating_count_tot
values are all over the place — from 0 to millions.
Instead of analyzing every single number, we group the values into ranges (intervals) to make the data easier to interpret.
We want to end up with a table like this:
User Ratings | Frequency |
---|---|
0 – 10,000 | 1,950 |
10,000 – 100,000 | 2,010 |
100,000 – 500,000 | 1,200 |
500,000 – 1,000,000 | 680 |
1,000,000+ | 540 |
(The numbers are just examples – we’ll calculate the real ones in code.)
Step-by-Step: Total Ratings Frequency Table
Step 1: Extract the rating values
n_user_ratings = []
for row in apps_data[1:]: # Skip the header
rating = int(row[5]) # Index 5 = rating_count_tot
n_user_ratings.append(rating)
Step 2: Explore the range
ratings_min = min(n_user_ratings)
ratings_max = max(n_user_ratings)
print("Minimum:", ratings_min)
print("Maximum:", ratings_max)
Once you see the range (likely 0
to over 1,000,000
), you can choose meaningful intervals.
Step 3: Define the intervals
We’ll define 5 buckets for our ratings:
user_ratings_freq = {
'0 - 10,000': 0,
'10,000 - 100,000': 0,
'100,000 - 500,000': 0,
'500,000 - 1,000,000': 0,
'1,000,000+': 0
}
Step 4: Count how many apps fall in each interval
Loop through each rating and place it into the right category:
for row in apps_data[1:]:
rating_count = int(row[5])
if rating_count <= 10_000:
user_ratings_freq['0 - 10,000'] += 1
elif rating_count <= 100_000:
user_ratings_freq['10,000 - 100,000'] += 1
elif rating_count <= 500_000:
user_ratings_freq['100,000 - 500,000'] += 1
elif rating_count <= 1_000_000:
user_ratings_freq['500,000 - 1,000,000'] += 1
else:
user_ratings_freq['1,000,000+'] += 1
Step 5: Display the frequency table
print(user_ratings_freq)
You’ll see something like this (actual values may vary):
{
'0 - 10,000': 2178,
'10,000 - 100,000': 1856,
'100,000 - 500,000': 1290,
'500,000 - 1,000,000': 789,
'1,000,000+': 484
}
Great! Let’s wrap up this lesson with a clear and friendly review section – keeping the same tone and structure we used in Lesson 3.
Review
In this lesson, we took dictionaries to the next level by using them to build frequency tables. These are powerful tools that let us count and summarize how often certain values or value ranges appear in our data — a common and very useful step in any data analysis process.
Let’s recap what we’ve covered:
✅ What We Learned
1. Checking for membership using in
You can check if a specific key exists in a dictionary:
'9+' in content_ratings # ✅ True
987 in content_ratings # ❌ False (it's a value, not a key)
2. Updating values in a dictionary Once you have a key, you can change its value:
content_ratings['4+'] = 4433
content_ratings['9+'] += 1
3. Counting frequencies using dictionaries We learned two ways to count:
- If you know the unique values beforehand:
ratings = ['4+', '4+', '9+']
content_ratings = {'4+': 0, '9+': 0}
for r in ratings:
content_ratings[r] += 1
- If you don’t know them:
ratings = ['4+', '4+', '9+']
content_ratings = {}
for r in ratings:
if r in content_ratings:
content_ratings[r] += 1
else:
content_ratings[r] = 1
4. Converting counts to proportions and percentages
total = 7197
for key in content_ratings:
proportion = content_ratings[key] / total
percentage = proportion * 100
We also learned how to keep separate dictionaries for proportions and percentages, which helps when we want to preserve the original counts:
c_ratings_proportions = {}
c_ratings_percentages = {}
for key in content_ratings:
prop = content_ratings[key] / total
perc = prop * 100
c_ratings_proportions[key] = prop
c_ratings_percentages[key] = perc
5. Grouping numeric values into intervals
When a column has too many unique numbers (like app size or user ratings), it’s helpful to create intervals:
intervals = {
'0 - 10 MB': 0,
'10 - 50 MB': 0,
'50 - 100 MB': 0,
'100 - 500 MB': 0,
'500 MB +': 0
}
And then we loop through the data and count how many values fall in each range.
🧠 Why This Matters
You’ll use this technique a lot in data analysis:
- Summarizing categories (like genres or content ratings)
- Grouping continuous values (like prices, sizes, or user ratings)
- Creating dashboards, charts, and reports
It’s one of the first big steps in going from “raw data” to “insights.”