Lesson 1 - NumPy Essentials and 1D Arrays
Welcome to NumPy
This lesson introduces you to NumPy, the cornerstone library for numerical computing in Python. You will learn what makes NumPy special, how it achieves incredible speed, and how to create and work with one-dimensional arrays.
By the end of this lesson, you will be able to:
- Understand what NumPy is and why it is essential for data analytics
- Import NumPy and create one-dimensional arrays
- Explain vectorization and why it is faster than loops
- Access individual elements and slices from arrays
- Understand array properties like shape, size, and data type
No prior NumPy experience is needed. Let’s begin your journey into efficient numerical computing.
What is NumPy?
NumPy stands for Numerical Python. It is Python’s most important library for scientific computing and data analysis. When you work with numbers in Python, especially large datasets, NumPy is the tool you need.
Why NumPy Matters
NumPy provides several critical advantages:
- Speed: NumPy operations are 10 to 100 times faster than pure Python
- Arrays: Powerful multi-dimensional array objects for storing data
- Mathematics: Comprehensive built-in mathematical functions
- Foundation: Pandas, SciPy, and scikit-learn are all built on top of NumPy
Every data scientist uses NumPy. It is not optional—it is fundamental.
Importing NumPy
The standard way to import NumPy uses the alias np:
import numpy as np
print("NumPy version:", np.__version__)This convention is universal. You will see np used in documentation, tutorials, and production code everywhere.
The Secret to NumPy’s Speed
Python is a high-level language, which means it is easy to write and understand. However, this convenience comes at a cost—pure Python is slow when processing large amounts of data.
NumPy solves this problem by using C code under the hood. C is a low-level language that runs much faster than Python. NumPy gives you the best of both worlds: you write simple Python code, but the actual computations happen at C speed.
Understanding Vectorization
Vectorization is the key concept that makes NumPy fast. Understanding this will change how you think about data processing.
The Problem with Loops
When you use Python lists and loops, you process data one element at a time. This approach is slow because Python must interpret each operation individually.
Here is an example of adding two lists using a loop:
# Add two lists using a loop (SLOW approach)
list_a = [10, 20, 30, 40, 50]
list_b = [1, 2, 3, 4, 5]
result = []
for i in range(len(list_a)):
result.append(list_a[i] + list_b[i])
print("Loop result:", result)
# Output: [11, 22, 33, 44, 55]This works, but imagine doing this with millions of numbers. Each addition requires Python to:
- Access the element from
list_a - Access the element from
list_b - Add them together
- Append to the result list
All of this happens one element at a time, in sequence.
The Solution: Vectorization
Vectorization means processing multiple items simultaneously instead of one at a time. NumPy performs operations on entire arrays at once, using optimized C code.
Here is the same operation with NumPy:
# Add two arrays using vectorization (FAST approach)
array_a = np.array([10, 20, 30, 40, 50])
array_b = np.array([1, 2, 3, 4, 5])
result = array_a + array_b
print("Vectorized result:", result)
# Output: [11 22 33 44 55]Notice how simple this is. One line of code replaces the entire loop. But more importantly, this executes much faster.
Understanding SIMD
Vectorization works through a technique called SIMD, which stands for Single Instruction, Multiple Data. Instead of telling the computer to add two numbers five separate times, you tell it once to add all five pairs simultaneously.
Think of it like this:
Loop Approach (Sequential):
Worker 1: Add 10 + 1
Worker 1: Add 20 + 2
Worker 1: Add 30 + 3
Worker 1: Add 40 + 4
Worker 1: Add 50 + 5
Vectorized Approach (Parallel):
Worker 1: Add 10 + 1
Worker 2: Add 20 + 2
Worker 3: Add 30 + 3
Worker 4: Add 40 + 4
Worker 5: Add 50 + 5
All workers operate simultaneously!Speed Comparison
Let’s measure the actual speed difference:
import time
# Create large datasets
size = 1000000
python_list_a = list(range(size))
python_list_b = list(range(size))
numpy_array_a = np.array(python_list_a)
numpy_array_b = np.array(python_list_b)
# Python loop method
start = time.time()
result_loop = [python_list_a[i] + python_list_b[i] for i in range(size)]
loop_time = time.time() - start
# NumPy vectorized method
start = time.time()
result_numpy = numpy_array_a + numpy_array_b
numpy_time = time.time() - start
print(f"Python loop: {loop_time:.4f} seconds")
print(f"NumPy vectorized: {numpy_time:.4f} seconds")
print(f"NumPy is {loop_time/numpy_time:.1f}x faster!")On most systems, you will see NumPy is 20 to 100 times faster. This difference becomes critical when working with real datasets containing millions of data points.
Creating One-Dimensional Arrays
A one-dimensional (1D) array is like a list, but with NumPy’s performance benefits. Let’s learn different ways to create them.
From Python Lists
The most straightforward way to create a NumPy array is from a Python list:
# Create array from list
scores = np.array([85, 92, 78, 90, 88])
print("Array:", scores)
print("Type:", type(scores))Output:
Array: [85 92 78 90 88]
Type: <class 'numpy.ndarray'>The data type numpy.ndarray means “NumPy n-dimensional array.” Even though we are working with 1D arrays now, the same object type handles arrays of any dimension.
Array Properties
Every NumPy array has important properties that describe it:
scores = np.array([85, 92, 78, 90, 88])
# Shape: dimensions of the array
print("Shape:", scores.shape)
# Output: (5,)
# This means a 1D array with 5 elements
# Data type: what kind of numbers are stored
print("Data type:", scores.dtype)
# Output: int64 (64-bit integer)
# Size: total number of elements
print("Size:", scores.size)
# Output: 5
# Length (also works with arrays)
print("Length:", len(scores))
# Output: 5Understanding these properties helps you know what you are working with, especially when debugging or working with real datasets.
Different Ways to Create Arrays
NumPy provides several convenient functions for creating arrays:
# Array from a list of floating-point numbers
temperatures = np.array([25.5, 30.2, 28.8, 31.0, 27.5])
print("Temperatures:", temperatures)
# Array of zeros
zeros = np.zeros(5)
print("Zeros:", zeros)
# Output: [0. 0. 0. 0. 0.]
# Array of ones
ones = np.ones(4)
print("Ones:", ones)
# Output: [1. 1. 1. 1.]
# Range of numbers (similar to Python's range)
numbers = np.arange(0, 10)
print("Range 0-9:", numbers)
# Output: [0 1 2 3 4 5 6 7 8 9]
# Range with step size
evens = np.arange(0, 11, 2)
print("Even numbers:", evens)
# Output: [ 0 2 4 6 8 10]These functions are useful when you need to initialize arrays with specific patterns before filling them with real data.
Data Types Matter
NumPy arrays can hold different types of numbers:
# Integer array
int_array = np.array([1, 2, 3, 4])
print("Integer array dtype:", int_array.dtype)
# Output: int64
# Float array (has decimal points)
float_array = np.array([1.0, 2.0, 3.0, 4.0])
print("Float array dtype:", float_array.dtype)
# Output: float64
# Mixed types - NumPy converts to the most general type
mixed = np.array([1, 2.5, 3, 4.8])
print("Mixed array:", mixed)
print("Type:", mixed.dtype)
# Output: [1. 2.5 3. 4.8]
# Output: float64When you mix integers and floats, NumPy automatically converts everything to floats because floats can represent both types of numbers. This automatic type conversion is called type promotion.
Real-World Example
Let’s create an array to hold student exam scores:
# Student exam scores
exam_scores = np.array([85, 92, 78, 88, 95, 72, 89, 91, 83, 87])
print("Exam scores:", exam_scores)
print(f"Number of students: {exam_scores.size}")
print(f"Data type: {exam_scores.dtype}")
print(f"Shape: {exam_scores.shape}")This simple array already demonstrates NumPy’s power. You can now perform calculations on all scores simultaneously, which you will learn in later lessons.
Accessing Array Elements
Once you have an array, you need to retrieve specific values from it. NumPy provides powerful indexing and slicing capabilities.
Single Element Access
You access elements using square brackets and indices, just like Python lists. Remember that indexing starts at 0:
# Create sample array
data = np.array([10, 20, 30, 40, 50, 60, 70, 80, 90, 100])
# Positive indices (count from the beginning)
print("Element at index 0:", data[0]) # 10
print("Element at index 5:", data[5]) # 60
# Negative indices (count from the end)
print("Last element:", data[-1]) # 100
print("Second to last:", data[-2]) # 90
print("Third from end:", data[-3]) # 80Negative indexing is particularly useful when you do not know the array length but need to access elements near the end.
Here is a visual representation:
Array: [10, 20, 30, 40, 50, 60, 70, 80, 90, 100]
Index: 0 1 2 3 4 5 6 7 8 9
Index: -10 -9 -8 -7 -6 -5 -4 -3 -2 -1Slicing Arrays
Slicing allows you to extract multiple elements at once. The syntax is array[start:end], where start is included but end is not.
numbers = np.array([0, 10, 20, 30, 40, 50, 60, 70, 80, 90])
# Basic slicing
print("First 3 elements:", numbers[0:3])
# Output: [ 0 10 20]
print("Elements 2-5:", numbers[2:6])
# Output: [20 30 40 50]
print("Last 3 elements:", numbers[-3:])
# Output: [70 80 90]Visual representation:
numbers[2:6] extracts:
Index: 0 1 2 3 4 5 6 7 8 9
Array: [0, 10, 20, 30, 40, 50, 60, 70, 80, 90]
^ ^ ^ ^
Start at 2, stop before 6
Result: [20, 30, 40, 50]Slicing Shortcuts
Python provides convenient shortcuts for common slicing patterns:
numbers = np.array([0, 10, 20, 30, 40, 50, 60, 70, 80, 90])
# From start to a specific index
print("From start to index 5:", numbers[:5])
# Output: [ 0 10 20 30 40]
# From a specific index to end
print("From index 6 to end:", numbers[6:])
# Output: [60 70 80 90]
# All elements (creates a copy)
print("All elements:", numbers[:])
# Output: [ 0 10 20 30 40 50 60 70 80 90]Slicing with Steps
You can also specify a step size to skip elements:
# Syntax: array[start:end:step]
data = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
# Every 2nd element
print("Every 2nd element:", data[::2])
# Output: [0 2 4 6 8]
# Every 3rd element
print("Every 3rd element:", data[::3])
# Output: [0 3 6 9]
# Reverse the array
print("Reversed:", data[::-1])
# Output: [9 8 7 6 5 4 3 2 1 0]
# From index 1 to 8, every 2nd element
print("Index 1 to 8, step 2:", data[1:8:2])
# Output: [1 3 5 7]The step parameter opens up many possibilities for extracting patterns from your data.
Practical Examples
Let’s apply these concepts to realistic scenarios:
# Monthly sales data (January to December)
sales = np.array([120, 135, 150, 145, 160, 175, 190, 185, 170, 155, 140, 165])
# First quarter (January, February, March)
q1 = sales[0:3]
print("Q1 sales:", q1)
# Output: [120 135 150]
# Last quarter (October, November, December)
q4 = sales[-3:]
print("Q4 sales:", q4)
# Output: [140 155 165]
# Summer months (June, July, August = indices 5, 6, 7)
summer = sales[5:8]
print("Summer sales:", summer)
# Output: [175 190 185]
# Every other month
bimonthly = sales[::2]
print("Bi-monthly:", bimonthly)
# Output: [120 150 160 190 170 140]Here is another example with temperature data:
# Temperature readings (hourly for 24 hours)
temps = np.array([18, 17, 16, 15, 15, 16, 18, 21, 24, 27, 29, 31,
32, 33, 32, 30, 28, 26, 24, 22, 21, 20, 19, 18])
# Morning temperatures (6 AM to 12 PM = indices 6-12)
morning = temps[6:12]
print("Morning temps:", morning)
# Output: [18 21 24 27 29 31]
# Night temperatures (8 PM to midnight = indices 20-24)
night = temps[20:]
print("Night temps:", night)
# Output: [21 20 19 18]
# Every 3 hours
every_3hrs = temps[::3]
print("Every 3 hours:", every_3hrs)
# Output: [18 15 18 27 32 30 24 21]These examples demonstrate how slicing helps you extract meaningful subsets from your data without writing loops.
Practice Exercises
Now it is your turn to apply what you have learned. Try these exercises on your own before looking at solutions.
Exercise 1: Create and Explore Arrays
Create an array of 10 student test scores. Then print:
- The array itself
- Its shape
- Its data type
- Its size
Try this on your own:
# Your code hereHint
Use np.array() with a list of 10 numbers. Then use .shape, .dtype, and .size properties.
Exercise 2: Access Elements
Given this array of prices, extract:
- The first element
- The last element
- The middle element (index 5)
prices = np.array([10.5, 20.3, 15.8, 30.0, 25.5, 18.2, 22.7, 28.1, 19.5, 24.0])
# Your code hereExercise 3: Slicing Practice
From the array below:
- Get the first 5 elements
- Get the last 3 elements
- Get every 3rd element
- Reverse the array
data = np.array([5, 10, 15, 20, 25, 30, 35, 40, 45, 50])
# Your code hereHint
Use slicing syntax: array[start:end:step]. Remember that negative step reverses the array.
Summary
Congratulations! You have completed your first NumPy lesson. Let’s review what you learned.
Key Concepts
NumPy Fundamentals
- NumPy is Python’s library for numerical computing and data analysis
- Import with
import numpy as np(standard convention) - NumPy is 10-100 times faster than pure Python for numerical operations
Vectorization
- Process multiple data items simultaneously instead of one at a time
- Eliminates the need for slow loops
- Uses SIMD (Single Instruction, Multiple Data) for efficiency
- Essential for efficient data analysis
Creating Arrays
np.array([1, 2, 3])creates an array from a listnp.zeros(5)creates an array of zerosnp.ones(4)creates an array of onesnp.arange(0, 10)creates a range of numbers
Array Properties
.shapeshows array dimensions.dtypeshows the data type (int64, float64, etc.).sizeshows the total number of elementslen()also works with arrays
Accessing Elements
array[0]accesses the first elementarray[-1]accesses the last elementarray[2:5]slices elements from index 2 to 4array[::2]selects every 2nd elementarray[::-1]reverses the array
Why This Matters
These fundamentals form the foundation for everything you will do with data in Python. Whether you are analyzing sales data, processing sensor readings, or preparing data for machine learning, you will use these techniques constantly.
Vectorization is particularly important. As you work with larger datasets, the speed difference between loops and vectorized operations becomes critical. A task that takes 10 minutes with loops might take only 6 seconds with NumPy.
Next Steps
You now understand NumPy basics and can work with one-dimensional arrays. In the next lesson, you will learn about two-dimensional arrays and how to load real data from CSV files.
Continue to Lesson 2 - 2D Arrays and CSV Data
Learn to work with two-dimensional arrays and load real datasets from CSV files
Back to Module Overview
Return to the NumPy Fundamentals module overview
Continue Building Your Skills
You have taken your first step into the world of efficient numerical computing. The skills you learned here—creating arrays, understanding vectorization, and accessing data—will serve you throughout your entire data analytics career.
Keep practicing these concepts. The more comfortable you become with NumPy arrays, the more powerful your data analysis will be!