Lesson 7 - Beyond the Basics: Python Dictionaries and Frequency Tables

In this lesson, we’ll build on what we learned about dictionaries in the previous lesson and use them to create frequency tables — a powerful tool for summarizing data. You’ll learn how to check if a value exists in a dictionary, update values, and use dictionaries to count and analyze data in a real dataset.

We’ll work with a dataset of iOS apps from the Apple App Store, and we’ll focus on a column called cont_rating, which represents the content rating (or age recommendation) for each app. Here’s a sample of the data from that column:

Content Rating	Number of Apps
4+	4433
9+	987
12+	1155
17+	622

As you can see, most apps are rated 4+, meaning they’re suitable for ages 4 and up, while fewer apps are rated 17+.

In this lesson, we’ll learn how to count values like this using dictionaries.

Checking for Dictionary Membership

Once we’ve created a dictionary, we can check whether a particular key exists in it using the in operator. This is a very useful way to validate data before accessing or modifying it.

Let’s take a look at an example:

content_ratings = {'4+': 4433, '9+': 987, '12+': 1155, '17+': 622}
print('12+' in content_ratings)

Output:

True

This returns True because '12+' is indeed a key in the content_ratings dictionary.

Now try checking for something that doesn’t exist:

print('10+' in content_ratings)

Output:

False

Here’s something important to remember: the in operator checks only the keys, not the values. So even if a value like 4433 exists, this will return False:

print(4433 in content_ratings)  # False

We can also use this inside if statements:

if '17+' in content_ratings:
    print("It exists")

Let’s try this in practice.

🧪 Practice Task

# Given dictionary
content_ratings = {'4+': 4433, '9+': 987, '12+': 1155, '17+': 622}

# Check for dictionary membership
is_in_dictionary_1 = '9+' in content_ratings
is_in_dictionary_2 = 987 in content_ratings

# Use in with an if statement
if '17+' in content_ratings:
    result = "It exists"
    print(result)

Next, we’ll learn how to update values in a dictionary and use them for counting.

Updating Dictionary Values

Once you’ve created a dictionary, you can easily change any of the values by referencing the key you want to update. This is especially helpful when you’re trying to correct data or track counts.

Here’s how it works.

Let’s say we have a dictionary where some values are incorrect:

content_ratings = {'4+': 622, '12+': '1155', '9+': 987, '17+': 4433}

It looks like the values for '4+' and '17+' are swapped, and the value for '12+' is stored as a string instead of an integer. Let’s fix all that step by step.

Step-by-step Breakdown

Swap two values using a temporary variable:

temp = content_ratings['4+']
content_ratings['4+'] = content_ratings['17+']
content_ratings['17+'] = temp

Convert a string to an integer:

content_ratings['12+'] = int(content_ratings['12+'])

Print the result:
```
print(content_ratings)
```

Now the dictionary is correct again:

{'4+': 4433, '12+': 1155, '9+': 987, '17+': 622}

🧪 Practice Task

Let’s do the same together.

# Given dictionary with swapped and incorrect data
content_ratings = {'4+': 622, '12+': '1155', '9+': 987, '17+': 4433}

# Step 1: Swap '4+' and '17+'
temp = content_ratings['4+']
content_ratings['4+'] = content_ratings['17+']
content_ratings['17+'] = temp

# Step 2: Convert '12+' value from string to integer
content_ratings['12+'] = int(content_ratings['12+'])

# Step 3: Print the corrected dictionary
print(content_ratings)

This kind of value updating is very common when you’re processing real-world datasets, where data might be stored in the wrong format or need adjustment.

Up next, we’ll take this one step further — using dictionaries to count things automatically.

Counting with Dictionaries

Imagine you have a list of content ratings from several apps:

ratings = ['4+', '4+', '4+', '9+', '9+', '12+', '17+']

You want to count how many times each rating appears. Sure, you could use .count() on the list — but with large datasets or more complex conditions, using a dictionary as a frequency table is much more scalable and flexible.

Step-by-step: Manual Frequency Count

Let’s manually create a dictionary with keys for each unique content rating and a value of 0:

content_ratings = {'4+': 0, '9+': 0, '12+': 0, '17+': 0}

Now we loop through the ratings list and increment the count for each rating:

ratings = ['4+', '4+', '4+', '9+', '9+', '12+', '17+']

for c_rating in ratings:
    if c_rating in content_ratings:
        content_ratings[c_rating] += 1

Check the result:

print(content_ratings)

Output:

{'4+': 3, '9+': 2, '12+': 1, '17+': 1}

Great! We now have a dictionary that counts how many times each content rating appears.

Let’s make it even clearer by printing the dictionary after each step of the loop:

for c_rating in ratings:
    if c_rating in content_ratings:
        content_ratings[c_rating] += 1
    print(content_ratings)

This helps you see how the dictionary updates in real time.

🧪 Your Turn: Use It on Real Data

Let’s now do this on real data from the Apple App Store.

We already know our content ratings: 4+, 9+, 12+, and 17+.

We’ll now read the Apple Store dataset (Download Here) and count how many apps fall into each category:

# Step 1: Initialize dictionary with zero values
content_ratings = {'4+': 0, '9+': 0, '12+': 0, '17+': 0}

# Step 2: Read dataset (assuming AppleStore.csv is already loaded)
opened_file = open('AppleStore.csv')
from csv import reader
read_file = reader(opened_file)
apps_data = list(read_file)

# Step 3: Loop through each row (skip header)
for row in apps_data[1:]:
    c_rating = row[10]
    if c_rating in content_ratings:
        content_ratings[c_rating] += 1

# Step 4: See the frequency table
print(content_ratings)

Expected output:

{'4+': 4433, '9+': 987, '12+': 1155, '17+': 622}

Awesome, we just built a frequency table with Python dictionaries!

Finding Unique Values While Counting

In the previous example, we had prior knowledge of all the unique content ratings (4+, 9+, 12+, 17+), so we created a dictionary with those as keys. But what if you’re working with a column and you don’t know in advance what values are in it?

Let’s simulate that with our same ratings list:

ratings = ['4+', '4+', '4+', '9+', '9+', '12+', '17+']

We’ll start with an empty dictionary, and for each content rating we see, we’ll check:

If the key already exists, increment its count.
If the key doesn’t exist, add it and start the count at 1.

Here’s how we do that:

content_ratings = {}

for c_rating in ratings:
    if c_rating in content_ratings:
        content_ratings[c_rating] += 1
    else:
        content_ratings[c_rating] = 1

Check the output:

print(content_ratings)

Result:

{'4+': 3, '9+': 2, '12+': 1, '17+': 1}

✅ Why `1`, not `0`?

We start at 1 (not 0) when we add a new key because we’re counting the first occurrence of that value. It’s already appeared once by the time we find it.

Let’s do it on the actual dataset!

This time we’ll apply this logic to the Apple App Store data — but we won’t assume which ratings are in there.

# Step 1: Create an empty dictionary
content_ratings = {}

# Step 2: Read the dataset (assumes AppleStore.csv is already opened)
opened_file = open('AppleStore.csv')
from csv import reader
read_file = reader(opened_file)
apps_data = list(read_file)

# Step 3: Loop through rows (skip header)
for row in apps_data[1:]:
    c_rating = row[10]
    if c_rating in content_ratings:
        content_ratings[c_rating] += 1
    else:
        content_ratings[c_rating] = 1

# Step 4: Print the result
print(content_ratings)

This will give you the real content rating frequency table — even if there are new or unexpected ratings in the data.

Seeing Changes Over Time

To better understand what’s going on as the loop runs, you can add a print statement inside the loop:

    print(content_ratings)

That way, you’ll see how the dictionary grows with each new app row — it’s like watching the frequency table build itself step-by-step.

Now that we’ve built a frequency table using a dictionary, let’s take the next step — converting those raw counts into proportions and percentages.

This helps us answer questions like:

What fraction of all apps are rated 4+?
What percentage of apps are suitable for a 15-year-old?

From Counts to Proportions and Percentages

Here’s our frequency table again (we’ll hard-code it here for simplicity):

content_ratings = {'4+': 4433, '9+': 987, '12+': 1155, '17+': 622}
total_number_of_apps = 7197

🔸 Proportion

A proportion is the part relative to the total. For example:

proportion_4_plus = 4433 / 7197  # 0.6159 (around 62%)

But doing that manually for each value is a pain. Instead, let’s use a loop to update every value in the dictionary:

for rating in content_ratings:
    content_ratings[rating] /= total_number_of_apps

Now each value is a proportion between 0 and 1.

🔸 Percentage

Want to see the values as percentages? Multiply each proportion by 100:

for rating in content_ratings:
    content_ratings[rating] *= 100

Now each dictionary value represents a percentage.

Full example:

content_ratings = {'4+': 4433, '9+': 987, '12+': 1155, '17+': 622}
total_number_of_apps = 7197

for rating in content_ratings:
    content_ratings[rating] /= total_number_of_apps
    content_ratings[rating] *= 100

print(content_ratings)

You’ll get something like:

{'4+': 61.59, '9+': 13.71, '12+': 16.05, '17+': 8.64}

Let’s Answer Two Real Questions

✅ 1. What percentage of apps are rated `17+`?

percentage_17_plus = content_ratings['17+']

Result: 8.64% of all apps are 17+.

✅ 2. What percentage of apps can a 15-year-old use?

A 15-year-old can use apps rated 4+, 9+, and 12+. So we add those percentages:

percentage_15_allowed = (
    content_ratings['4+'] +
    content_ratings['9+'] +
    content_ratings['12+']
)

Or:

percentage_15_allowed = 100 - content_ratings['17+']

Either way, it gives you around 91.36%.

Summary

Proportion: divide each frequency by total.
Percentage: multiply each proportion by 100.
Use dictionary keys to update values in a loop.
Now we can turn raw data into insights.

Let’s now take a look at how we can keep the original frequency data intact while also computing proportions and percentages using separate dictionaries.

This is super useful if you want to refer back to the original counts later (which is very common in data analysis).

Why Separate Dictionaries?

Previously, we updated the content_ratings dictionary in-place:

content_ratings['4+'] /= total_number_of_apps

This overwrites the original value (e.g., 4433 → 61.59).

But what if you need both the original frequency and the calculated proportion?

➡️ Solution: Create two new dictionaries:

One for proportions
One for percentages

Creating a Proportions Dictionary

Start with the original:

content_ratings = {'4+': 4433, '12+': 1155, '9+': 987, '17+': 622}
total_number_of_apps = 7197

Now let’s create a new dictionary to store proportions:

c_ratings_proportions = {}

for key in content_ratings:
    proportion = content_ratings[key] / total_number_of_apps
    c_ratings_proportions[key] = proportion

Now c_ratings_proportions contains:

{
  '4+': 0.6159,
  '12+': 0.1605,
  '9+': 0.1371,
  '17+': 0.0864
}

Creating a Percentages Dictionary

Same idea, but multiply each proportion by 100:

c_ratings_percentages = {}

for key in c_ratings_proportions:
    percentage = c_ratings_proportions[key] * 100
    c_ratings_percentages[key] = percentage

Now you have:

{
  '4+': 61.59,
  '12+': 16.05,
  '9+': 13.71,
  '17+': 8.64
}

Or Do Both in One Loop?

You can also create both dictionaries at once:

c_ratings_proportions = {}
c_ratings_percentages = {}

for key in content_ratings:
    proportion = content_ratings[key] / total_number_of_apps
    percentage = proportion * 100

    c_ratings_proportions[key] = proportion
    c_ratings_percentages[key] = percentage

This avoids looping through twice — more efficient!

Final Code Summary

content_ratings = {'4+': 4433, '12+': 1155, '9+': 987, '17+': 622}
total_number_of_apps = 7197

c_ratings_proportions = {}
c_ratings_percentages = {}

for key in content_ratings:
    proportion = content_ratings[key] / total_number_of_apps
    percentage = proportion * 100

    c_ratings_proportions[key] = proportion
    c_ratings_percentages[key] = percentage

print(c_ratings_proportions)
print(c_ratings_percentages)

Perfect! Let’s now build a frequency table for the rating_count_tot column — this column tells us how many total user ratings each app has received.

This time, we’ll:

Extract the data
Find the min and max values
Choose intervals
Count how many ratings fall into each interval

All in one clean, readable lesson — beginner-friendly, just like before.

Understanding the Need

Just like with app sizes, the rating_count_tot values are all over the place — from 0 to millions.

Instead of analyzing every single number, we group the values into ranges (intervals) to make the data easier to interpret.

We want to end up with a table like this:

User Ratings	Frequency
0 – 10,000	1,950
10,000 – 100,000	2,010
100,000 – 500,000	1,200
500,000 – 1,000,000	680
1,000,000+	540

(The numbers are just examples – we’ll calculate the real ones in code.)

Step-by-Step: Total Ratings Frequency Table

Step 1: Extract the rating values

n_user_ratings = []

for row in apps_data[1:]:  # Skip the header
    rating = int(row[5])   # Index 5 = rating_count_tot
    n_user_ratings.append(rating)

Step 2: Explore the range

ratings_min = min(n_user_ratings)
ratings_max = max(n_user_ratings)

print("Minimum:", ratings_min)
print("Maximum:", ratings_max)

Once you see the range (likely 0 to over 1,000,000), you can choose meaningful intervals.

Step 3: Define the intervals

We’ll define 5 buckets for our ratings:

user_ratings_freq = {
    '0 - 10,000': 0,
    '10,000 - 100,000': 0,
    '100,000 - 500,000': 0,
    '500,000 - 1,000,000': 0,
    '1,000,000+': 0
}

Step 4: Count how many apps fall in each interval

Loop through each rating and place it into the right category:

for row in apps_data[1:]:
    rating_count = int(row[5])

    if rating_count <= 10_000:
        user_ratings_freq['0 - 10,000'] += 1

    elif rating_count <= 100_000:
        user_ratings_freq['10,000 - 100,000'] += 1

    elif rating_count <= 500_000:
        user_ratings_freq['100,000 - 500,000'] += 1

    elif rating_count <= 1_000_000:
        user_ratings_freq['500,000 - 1,000,000'] += 1

    else:
        user_ratings_freq['1,000,000+'] += 1

Step 5: Display the frequency table

print(user_ratings_freq)

You’ll see something like this (actual values may vary):

{
  '0 - 10,000': 2178,
  '10,000 - 100,000': 1856,
  '100,000 - 500,000': 1290,
  '500,000 - 1,000,000': 789,
  '1,000,000+': 484
}

Great! Let’s wrap up this lesson with a clear and friendly review section – keeping the same tone and structure we used in Lesson 3.

Review

In this lesson, we took dictionaries to the next level by using them to build frequency tables. These are powerful tools that let us count and summarize how often certain values or value ranges appear in our data — a common and very useful step in any data analysis process.

Let’s recap what we’ve covered:

✅ What We Learned

1. Checking for membership using in You can check if a specific key exists in a dictionary:

'9+' in content_ratings     # ✅ True
987 in content_ratings      # ❌ False (it's a value, not a key)

2. Updating values in a dictionary Once you have a key, you can change its value:

content_ratings['4+'] = 4433
content_ratings['9+'] += 1

3. Counting frequencies using dictionaries We learned two ways to count:

If you know the unique values beforehand:

ratings = ['4+', '4+', '9+']
content_ratings = {'4+': 0, '9+': 0}
for r in ratings:
    content_ratings[r] += 1

If you don’t know them:

ratings = ['4+', '4+', '9+']
content_ratings = {}
for r in ratings:
    if r in content_ratings:
        content_ratings[r] += 1
    else:
        content_ratings[r] = 1

4. Converting counts to proportions and percentages

total = 7197
for key in content_ratings:
    proportion = content_ratings[key] / total
    percentage = proportion * 100

We also learned how to keep separate dictionaries for proportions and percentages, which helps when we want to preserve the original counts:

c_ratings_proportions = {}
c_ratings_percentages = {}

for key in content_ratings:
    prop = content_ratings[key] / total
    perc = prop * 100
    c_ratings_proportions[key] = prop
    c_ratings_percentages[key] = perc

5. Grouping numeric values into intervals

When a column has too many unique numbers (like app size or user ratings), it’s helpful to create intervals:

intervals = {
  '0 - 10 MB': 0,
  '10 - 50 MB': 0,
  '50 - 100 MB': 0,
  '100 - 500 MB': 0,
  '500 MB +': 0
}

And then we loop through the data and count how many values fall in each range.

🧠 Why This Matters

You’ll use this technique a lot in data analysis:

Summarizing categories (like genres or content ratings)
Grouping continuous values (like prices, sizes, or user ratings)
Creating dashboards, charts, and reports

It’s one of the first big steps in going from “raw data” to “insights.”

Lesson 6 - Python Dictionaries

Lesson 8 - Python Functions

Lesson 7 - Beyond the Basics: Python Dictionaries and Frequency Tables

Checking for Dictionary Membership#

🧪 Practice Task#

Updating Dictionary Values#

Step-by-step Breakdown#

🧪 Practice Task#

Counting with Dictionaries#

Step-by-step: Manual Frequency Count#

Check the result:#

🧪 Your Turn: Use It on Real Data#

Finding Unique Values While Counting#

Here’s how we do that:#

✅ Why 1, not 0?#

Let’s do it on the actual dataset!#

Seeing Changes Over Time#

From Counts to Proportions and Percentages#

🔸 Proportion#

🔸 Percentage#

Full example:#

Let’s Answer Two Real Questions#

✅ 1. What percentage of apps are rated 17+?#

✅ 2. What percentage of apps can a 15-year-old use?#

Summary#

Why Separate Dictionaries?#

Creating a Proportions Dictionary#

Creating a Percentages Dictionary#

Or Do Both in One Loop?#

Final Code Summary#

Understanding the Need#

Step-by-Step: Total Ratings Frequency Table#

Step 1: Extract the rating values#

Step 2: Explore the range#

Step 3: Define the intervals#

Step 4: Count how many apps fall in each interval#

Step 5: Display the frequency table#

Review

✅ What We Learned#

🧠 Why This Matters#

Checking for Dictionary Membership

🧪 Practice Task

Updating Dictionary Values

Step-by-step Breakdown

🧪 Practice Task

Counting with Dictionaries

Step-by-step: Manual Frequency Count

Check the result:

🧪 Your Turn: Use It on Real Data

Finding Unique Values While Counting

Here’s how we do that:

✅ Why `1`, not `0`?

Let’s do it on the actual dataset!

Seeing Changes Over Time

From Counts to Proportions and Percentages

🔸 Proportion

🔸 Percentage

Full example:

Let’s Answer Two Real Questions

✅ 1. What percentage of apps are rated `17+`?

✅ 2. What percentage of apps can a 15-year-old use?

Summary

Why Separate Dictionaries?

Creating a Proportions Dictionary

Creating a Percentages Dictionary

Or Do Both in One Loop?

Final Code Summary

Understanding the Need

Step-by-Step: Total Ratings Frequency Table

Step 1: Extract the rating values

Step 2: Explore the range

Step 3: Define the intervals

Step 4: Count how many apps fall in each interval

Step 5: Display the frequency table

✅ What We Learned

🧠 Why This Matters