Lesson 1 - NumPy Essentials and 1D Arrays

Welcome to NumPy

This lesson introduces you to NumPy, the cornerstone library for numerical computing in Python. You will learn what makes NumPy special, how it achieves incredible speed, and how to create and work with one-dimensional arrays.

By the end of this lesson, you will be able to:

Understand what NumPy is and why it is essential for data analytics
Import NumPy and create one-dimensional arrays
Explain vectorization and why it is faster than loops
Access individual elements and slices from arrays
Understand array properties like shape, size, and data type

No prior NumPy experience is needed. Let’s begin your journey into efficient numerical computing.

What is NumPy?

NumPy stands for Numerical Python. It is Python’s most important library for scientific computing and data analysis. When you work with numbers in Python, especially large datasets, NumPy is the tool you need.

Why NumPy Matters

NumPy provides several critical advantages:

Speed: NumPy operations are 10 to 100 times faster than pure Python
Arrays: Powerful multi-dimensional array objects for storing data
Mathematics: Comprehensive built-in mathematical functions
Foundation: Pandas, SciPy, and scikit-learn are all built on top of NumPy

Every data scientist uses NumPy. It is not optional—it is fundamental.

Importing NumPy

The standard way to import NumPy uses the alias np:

import numpy as np

print("NumPy version:", np.__version__)

This convention is universal. You will see np used in documentation, tutorials, and production code everywhere.

The Secret to NumPy’s Speed

Python is a high-level language, which means it is easy to write and understand. However, this convenience comes at a cost—pure Python is slow when processing large amounts of data.

NumPy solves this problem by using C code under the hood. C is a low-level language that runs much faster than Python. NumPy gives you the best of both worlds: you write simple Python code, but the actual computations happen at C speed.

Understanding Vectorization

Vectorization is the key concept that makes NumPy fast. Understanding this will change how you think about data processing.

The Problem with Loops

When you use Python lists and loops, you process data one element at a time. This approach is slow because Python must interpret each operation individually.

Here is an example of adding two lists using a loop:

# Add two lists using a loop (SLOW approach)
list_a = [10, 20, 30, 40, 50]
list_b = [1, 2, 3, 4, 5]

result = []
for i in range(len(list_a)):
    result.append(list_a[i] + list_b[i])

print("Loop result:", result)
# Output: [11, 22, 33, 44, 55]

This works, but imagine doing this with millions of numbers. Each addition requires Python to:

Access the element from list_a
Access the element from list_b
Add them together
Append to the result list

All of this happens one element at a time, in sequence.

The Solution: Vectorization

Vectorization means processing multiple items simultaneously instead of one at a time. NumPy performs operations on entire arrays at once, using optimized C code.

Here is the same operation with NumPy:

# Add two arrays using vectorization (FAST approach)
array_a = np.array([10, 20, 30, 40, 50])
array_b = np.array([1, 2, 3, 4, 5])

result = array_a + array_b

print("Vectorized result:", result)
# Output: [11 22 33 44 55]

Notice how simple this is. One line of code replaces the entire loop. But more importantly, this executes much faster.

Understanding SIMD

Vectorization works through a technique called SIMD, which stands for Single Instruction, Multiple Data. Instead of telling the computer to add two numbers five separate times, you tell it once to add all five pairs simultaneously.

Think of it like this:

Loop Approach (Sequential):
Worker 1: Add 10 + 1
Worker 1: Add 20 + 2
Worker 1: Add 30 + 3
Worker 1: Add 40 + 4
Worker 1: Add 50 + 5

Vectorized Approach (Parallel):
Worker 1: Add 10 + 1
Worker 2: Add 20 + 2
Worker 3: Add 30 + 3
Worker 4: Add 40 + 4
Worker 5: Add 50 + 5

All workers operate simultaneously!

Speed Comparison

Let’s measure the actual speed difference:

import time

# Create large datasets
size = 1000000
python_list_a = list(range(size))
python_list_b = list(range(size))
numpy_array_a = np.array(python_list_a)
numpy_array_b = np.array(python_list_b)

# Python loop method
start = time.time()
result_loop = [python_list_a[i] + python_list_b[i] for i in range(size)]
loop_time = time.time() - start

# NumPy vectorized method
start = time.time()
result_numpy = numpy_array_a + numpy_array_b
numpy_time = time.time() - start

print(f"Python loop:      {loop_time:.4f} seconds")
print(f"NumPy vectorized: {numpy_time:.4f} seconds")
print(f"NumPy is {loop_time/numpy_time:.1f}x faster!")

On most systems, you will see NumPy is 20 to 100 times faster. This difference becomes critical when working with real datasets containing millions of data points.

Creating One-Dimensional Arrays

A one-dimensional (1D) array is like a list, but with NumPy’s performance benefits. Let’s learn different ways to create them.

From Python Lists

The most straightforward way to create a NumPy array is from a Python list:

# Create array from list
scores = np.array([85, 92, 78, 90, 88])

print("Array:", scores)
print("Type:", type(scores))

Output:

Array: [85 92 78 90 88]
Type: <class 'numpy.ndarray'>

The data type numpy.ndarray means “NumPy n-dimensional array.” Even though we are working with 1D arrays now, the same object type handles arrays of any dimension.

Array Properties

Every NumPy array has important properties that describe it:

scores = np.array([85, 92, 78, 90, 88])

# Shape: dimensions of the array
print("Shape:", scores.shape)
# Output: (5,)
# This means a 1D array with 5 elements

# Data type: what kind of numbers are stored
print("Data type:", scores.dtype)
# Output: int64 (64-bit integer)

# Size: total number of elements
print("Size:", scores.size)
# Output: 5

# Length (also works with arrays)
print("Length:", len(scores))
# Output: 5

Understanding these properties helps you know what you are working with, especially when debugging or working with real datasets.

Different Ways to Create Arrays

NumPy provides several convenient functions for creating arrays:

# Array from a list of floating-point numbers
temperatures = np.array([25.5, 30.2, 28.8, 31.0, 27.5])
print("Temperatures:", temperatures)

# Array of zeros
zeros = np.zeros(5)
print("Zeros:", zeros)
# Output: [0. 0. 0. 0. 0.]

# Array of ones
ones = np.ones(4)
print("Ones:", ones)
# Output: [1. 1. 1. 1.]

# Range of numbers (similar to Python's range)
numbers = np.arange(0, 10)
print("Range 0-9:", numbers)
# Output: [0 1 2 3 4 5 6 7 8 9]

# Range with step size
evens = np.arange(0, 11, 2)
print("Even numbers:", evens)
# Output: [ 0  2  4  6  8 10]

These functions are useful when you need to initialize arrays with specific patterns before filling them with real data.

Data Types Matter

NumPy arrays can hold different types of numbers:

# Integer array
int_array = np.array([1, 2, 3, 4])
print("Integer array dtype:", int_array.dtype)
# Output: int64

# Float array (has decimal points)
float_array = np.array([1.0, 2.0, 3.0, 4.0])
print("Float array dtype:", float_array.dtype)
# Output: float64

# Mixed types - NumPy converts to the most general type
mixed = np.array([1, 2.5, 3, 4.8])
print("Mixed array:", mixed)
print("Type:", mixed.dtype)
# Output: [1.  2.5 3.  4.8]
# Output: float64

When you mix integers and floats, NumPy automatically converts everything to floats because floats can represent both types of numbers. This automatic type conversion is called type promotion.

Real-World Example

Let’s create an array to hold student exam scores:

# Student exam scores
exam_scores = np.array([85, 92, 78, 88, 95, 72, 89, 91, 83, 87])

print("Exam scores:", exam_scores)
print(f"Number of students: {exam_scores.size}")
print(f"Data type: {exam_scores.dtype}")
print(f"Shape: {exam_scores.shape}")

This simple array already demonstrates NumPy’s power. You can now perform calculations on all scores simultaneously, which you will learn in later lessons.

Accessing Array Elements

Once you have an array, you need to retrieve specific values from it. NumPy provides powerful indexing and slicing capabilities.

Single Element Access

You access elements using square brackets and indices, just like Python lists. Remember that indexing starts at 0:

# Create sample array
data = np.array([10, 20, 30, 40, 50, 60, 70, 80, 90, 100])

# Positive indices (count from the beginning)
print("Element at index 0:", data[0])    # 10
print("Element at index 5:", data[5])    # 60

# Negative indices (count from the end)
print("Last element:", data[-1])         # 100
print("Second to last:", data[-2])       # 90
print("Third from end:", data[-3])       # 80

Negative indexing is particularly useful when you do not know the array length but need to access elements near the end.

Here is a visual representation:

Array:  [10, 20, 30, 40, 50, 60, 70, 80, 90, 100]
Index:   0   1   2   3   4   5   6   7   8    9
Index:  -10 -9  -8  -7  -6  -5  -4  -3  -2   -1

Slicing Arrays

Slicing allows you to extract multiple elements at once. The syntax is array[start:end], where start is included but end is not.

numbers = np.array([0, 10, 20, 30, 40, 50, 60, 70, 80, 90])

# Basic slicing
print("First 3 elements:", numbers[0:3])
# Output: [ 0 10 20]

print("Elements 2-5:", numbers[2:6])
# Output: [20 30 40 50]

print("Last 3 elements:", numbers[-3:])
# Output: [70 80 90]

Visual representation:

numbers[2:6] extracts:

Index:     0   1   2   3   4   5   6   7   8   9
Array:    [0, 10, 20, 30, 40, 50, 60, 70, 80, 90]
                  ^   ^   ^   ^
                  Start at 2, stop before 6
Result: [20, 30, 40, 50]

Slicing Shortcuts

Python provides convenient shortcuts for common slicing patterns:

numbers = np.array([0, 10, 20, 30, 40, 50, 60, 70, 80, 90])

# From start to a specific index
print("From start to index 5:", numbers[:5])
# Output: [ 0 10 20 30 40]

# From a specific index to end
print("From index 6 to end:", numbers[6:])
# Output: [60 70 80 90]

# All elements (creates a copy)
print("All elements:", numbers[:])
# Output: [ 0 10 20 30 40 50 60 70 80 90]

Slicing with Steps

You can also specify a step size to skip elements:

# Syntax: array[start:end:step]
data = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

# Every 2nd element
print("Every 2nd element:", data[::2])
# Output: [0 2 4 6 8]

# Every 3rd element
print("Every 3rd element:", data[::3])
# Output: [0 3 6 9]

# Reverse the array
print("Reversed:", data[::-1])
# Output: [9 8 7 6 5 4 3 2 1 0]

# From index 1 to 8, every 2nd element
print("Index 1 to 8, step 2:", data[1:8:2])
# Output: [1 3 5 7]

The step parameter opens up many possibilities for extracting patterns from your data.

Practical Examples

Let’s apply these concepts to realistic scenarios:

# Monthly sales data (January to December)
sales = np.array([120, 135, 150, 145, 160, 175, 190, 185, 170, 155, 140, 165])

# First quarter (January, February, March)
q1 = sales[0:3]
print("Q1 sales:", q1)
# Output: [120 135 150]

# Last quarter (October, November, December)
q4 = sales[-3:]
print("Q4 sales:", q4)
# Output: [140 155 165]

# Summer months (June, July, August = indices 5, 6, 7)
summer = sales[5:8]
print("Summer sales:", summer)
# Output: [175 190 185]

# Every other month
bimonthly = sales[::2]
print("Bi-monthly:", bimonthly)
# Output: [120 150 160 190 170 140]

Here is another example with temperature data:

# Temperature readings (hourly for 24 hours)
temps = np.array([18, 17, 16, 15, 15, 16, 18, 21, 24, 27, 29, 31,
                  32, 33, 32, 30, 28, 26, 24, 22, 21, 20, 19, 18])

# Morning temperatures (6 AM to 12 PM = indices 6-12)
morning = temps[6:12]
print("Morning temps:", morning)
# Output: [18 21 24 27 29 31]

# Night temperatures (8 PM to midnight = indices 20-24)
night = temps[20:]
print("Night temps:", night)
# Output: [21 20 19 18]

# Every 3 hours
every_3hrs = temps[::3]
print("Every 3 hours:", every_3hrs)
# Output: [18 15 18 27 32 30 24 21]

These examples demonstrate how slicing helps you extract meaningful subsets from your data without writing loops.

Practice Exercises

Now it is your turn to apply what you have learned. Try these exercises on your own before looking at solutions.

Exercise 1: Create and Explore Arrays

Create an array of 10 student test scores. Then print:

The array itself
Its shape
Its data type
Its size

Try this on your own:

# Your code here

Hint

Use np.array() with a list of 10 numbers. Then use .shape, .dtype, and .size properties.

Exercise 2: Access Elements

Given this array of prices, extract:

The first element
The last element
The middle element (index 5)

prices = np.array([10.5, 20.3, 15.8, 30.0, 25.5, 18.2, 22.7, 28.1, 19.5, 24.0])

# Your code here

Exercise 3: Slicing Practice

From the array below:

Get the first 5 elements
Get the last 3 elements
Get every 3rd element
Reverse the array

data = np.array([5, 10, 15, 20, 25, 30, 35, 40, 45, 50])

# Your code here

Hint

Use slicing syntax: array[start:end:step]. Remember that negative step reverses the array.

Summary

Congratulations! You have completed your first NumPy lesson. Let’s review what you learned.

Key Concepts

NumPy Fundamentals

NumPy is Python’s library for numerical computing and data analysis
Import with import numpy as np (standard convention)
NumPy is 10-100 times faster than pure Python for numerical operations

Vectorization

Process multiple data items simultaneously instead of one at a time
Eliminates the need for slow loops
Uses SIMD (Single Instruction, Multiple Data) for efficiency
Essential for efficient data analysis

Creating Arrays

np.array([1, 2, 3]) creates an array from a list
np.zeros(5) creates an array of zeros
np.ones(4) creates an array of ones
np.arange(0, 10) creates a range of numbers

Array Properties

.shape shows array dimensions
.dtype shows the data type (int64, float64, etc.)
.size shows the total number of elements
len() also works with arrays

Accessing Elements

array[0] accesses the first element
array[-1] accesses the last element
array[2:5] slices elements from index 2 to 4
array[::2] selects every 2nd element
array[::-1] reverses the array

Why This Matters

These fundamentals form the foundation for everything you will do with data in Python. Whether you are analyzing sales data, processing sensor readings, or preparing data for machine learning, you will use these techniques constantly.

Vectorization is particularly important. As you work with larger datasets, the speed difference between loops and vectorized operations becomes critical. A task that takes 10 minutes with loops might take only 6 seconds with NumPy.

Next Steps

You now understand NumPy basics and can work with one-dimensional arrays. In the next lesson, you will learn about two-dimensional arrays and how to load real data from CSV files.

Continue to Lesson 2 - 2D Arrays and CSV Data

Learn to work with two-dimensional arrays and load real datasets from CSV files

Back to Module Overview

Return to the NumPy Fundamentals module overview

Continue Building Your Skills

You have taken your first step into the world of efficient numerical computing. The skills you learned here—creating arrays, understanding vectorization, and accessing data—will serve you throughout your entire data analytics career.

Keep practicing these concepts. The more comfortable you become with NumPy arrays, the more powerful your data analysis will be!

Overview of the NumPy Fundamentals Module

Lesson 2 - 2D Arrays and Working with CSV Data

Courses

DATATWEETS

Title here

Lesson 1 - NumPy Essentials and 1D Arrays

Welcome to NumPy

What is NumPy?

Why NumPy Matters

Importing NumPy

The Secret to NumPy’s Speed

Understanding Vectorization

The Problem with Loops

The Solution: Vectorization

Understanding SIMD

Speed Comparison

Creating One-Dimensional Arrays

From Python Lists

Array Properties

Different Ways to Create Arrays

Data Types Matter

Real-World Example

Accessing Array Elements

Single Element Access

Slicing Arrays

Slicing Shortcuts

Slicing with Steps

Practical Examples

Practice Exercises

Exercise 1: Create and Explore Arrays

Exercise 2: Access Elements

Exercise 3: Slicing Practice

Summary

Key Concepts

Why This Matters

Next Steps

Continue to Lesson 2 - 2D Arrays and CSV Data

Back to Module Overview

Continue Building Your Skills

Lesson 1 - NumPy Essentials and 1D Arrays

Welcome to NumPy#

What is NumPy?#

Why NumPy Matters#

Importing NumPy#

The Secret to NumPy’s Speed#

Understanding Vectorization#

The Problem with Loops#

The Solution: Vectorization#

Understanding SIMD#

Speed Comparison#

Creating One-Dimensional Arrays#

From Python Lists#

Array Properties#

Different Ways to Create Arrays#

Data Types Matter#

Real-World Example#

Accessing Array Elements#

Single Element Access#

Slicing Arrays#

Slicing Shortcuts#

Slicing with Steps#

Practical Examples#

Practice Exercises#

Exercise 1: Create and Explore Arrays#

Exercise 2: Access Elements#

Exercise 3: Slicing Practice#

Summary#

Key Concepts#

Why This Matters#

Next Steps#

Continue to Lesson 2 - 2D Arrays and CSV Data

Back to Module Overview

Continue Building Your Skills#

Welcome to NumPy

What is NumPy?

Why NumPy Matters

Importing NumPy

The Secret to NumPy’s Speed

Understanding Vectorization

The Problem with Loops

The Solution: Vectorization

Understanding SIMD

Speed Comparison

Creating One-Dimensional Arrays

From Python Lists

Array Properties

Different Ways to Create Arrays

Data Types Matter

Real-World Example

Accessing Array Elements

Single Element Access

Slicing Arrays

Slicing Shortcuts

Slicing with Steps

Practical Examples

Practice Exercises

Exercise 1: Create and Explore Arrays

Exercise 2: Access Elements

Exercise 3: Slicing Practice

Summary

Key Concepts

Why This Matters

Next Steps

Continue Building Your Skills