Lesson 1 - NumPy Essentials and 1D Arrays

Welcome to NumPy

This lesson introduces you to NumPy, the cornerstone library for numerical computing in Python. You will learn what makes NumPy special, how it achieves incredible speed, and how to create and work with one-dimensional arrays.

By the end of this lesson, you will be able to:

  • Understand what NumPy is and why it is essential for data analytics
  • Import NumPy and create one-dimensional arrays
  • Explain vectorization and why it is faster than loops
  • Access individual elements and slices from arrays
  • Understand array properties like shape, size, and data type

No prior NumPy experience is needed. Let’s begin your journey into efficient numerical computing.


What is NumPy?

NumPy stands for Numerical Python. It is Python’s most important library for scientific computing and data analysis. When you work with numbers in Python, especially large datasets, NumPy is the tool you need.

Why NumPy Matters

NumPy provides several critical advantages:

  • Speed: NumPy operations are 10 to 100 times faster than pure Python
  • Arrays: Powerful multi-dimensional array objects for storing data
  • Mathematics: Comprehensive built-in mathematical functions
  • Foundation: Pandas, SciPy, and scikit-learn are all built on top of NumPy

Every data scientist uses NumPy. It is not optional—it is fundamental.

Importing NumPy

The standard way to import NumPy uses the alias np:

import numpy as np

print("NumPy version:", np.__version__)

This convention is universal. You will see np used in documentation, tutorials, and production code everywhere.

The Secret to NumPy’s Speed

Python is a high-level language, which means it is easy to write and understand. However, this convenience comes at a cost—pure Python is slow when processing large amounts of data.

NumPy solves this problem by using C code under the hood. C is a low-level language that runs much faster than Python. NumPy gives you the best of both worlds: you write simple Python code, but the actual computations happen at C speed.


Understanding Vectorization

Vectorization is the key concept that makes NumPy fast. Understanding this will change how you think about data processing.

The Problem with Loops

When you use Python lists and loops, you process data one element at a time. This approach is slow because Python must interpret each operation individually.

Here is an example of adding two lists using a loop:

# Add two lists using a loop (SLOW approach)
list_a = [10, 20, 30, 40, 50]
list_b = [1, 2, 3, 4, 5]

result = []
for i in range(len(list_a)):
    result.append(list_a[i] + list_b[i])

print("Loop result:", result)
# Output: [11, 22, 33, 44, 55]

This works, but imagine doing this with millions of numbers. Each addition requires Python to:

  1. Access the element from list_a
  2. Access the element from list_b
  3. Add them together
  4. Append to the result list

All of this happens one element at a time, in sequence.

The Solution: Vectorization

Vectorization means processing multiple items simultaneously instead of one at a time. NumPy performs operations on entire arrays at once, using optimized C code.

Here is the same operation with NumPy:

# Add two arrays using vectorization (FAST approach)
array_a = np.array([10, 20, 30, 40, 50])
array_b = np.array([1, 2, 3, 4, 5])

result = array_a + array_b

print("Vectorized result:", result)
# Output: [11 22 33 44 55]

Notice how simple this is. One line of code replaces the entire loop. But more importantly, this executes much faster.

Understanding SIMD

Vectorization works through a technique called SIMD, which stands for Single Instruction, Multiple Data. Instead of telling the computer to add two numbers five separate times, you tell it once to add all five pairs simultaneously.

Think of it like this:

Loop Approach (Sequential):
Worker 1: Add 10 + 1
Worker 1: Add 20 + 2
Worker 1: Add 30 + 3
Worker 1: Add 40 + 4
Worker 1: Add 50 + 5

Vectorized Approach (Parallel):
Worker 1: Add 10 + 1
Worker 2: Add 20 + 2
Worker 3: Add 30 + 3
Worker 4: Add 40 + 4
Worker 5: Add 50 + 5

All workers operate simultaneously!

Speed Comparison

Let’s measure the actual speed difference:

import time

# Create large datasets
size = 1000000
python_list_a = list(range(size))
python_list_b = list(range(size))
numpy_array_a = np.array(python_list_a)
numpy_array_b = np.array(python_list_b)

# Python loop method
start = time.time()
result_loop = [python_list_a[i] + python_list_b[i] for i in range(size)]
loop_time = time.time() - start

# NumPy vectorized method
start = time.time()
result_numpy = numpy_array_a + numpy_array_b
numpy_time = time.time() - start

print(f"Python loop:      {loop_time:.4f} seconds")
print(f"NumPy vectorized: {numpy_time:.4f} seconds")
print(f"NumPy is {loop_time/numpy_time:.1f}x faster!")

On most systems, you will see NumPy is 20 to 100 times faster. This difference becomes critical when working with real datasets containing millions of data points.


Creating One-Dimensional Arrays

A one-dimensional (1D) array is like a list, but with NumPy’s performance benefits. Let’s learn different ways to create them.

From Python Lists

The most straightforward way to create a NumPy array is from a Python list:

# Create array from list
scores = np.array([85, 92, 78, 90, 88])

print("Array:", scores)
print("Type:", type(scores))

Output:

Array: [85 92 78 90 88]
Type: <class 'numpy.ndarray'>

The data type numpy.ndarray means “NumPy n-dimensional array.” Even though we are working with 1D arrays now, the same object type handles arrays of any dimension.

Array Properties

Every NumPy array has important properties that describe it:

scores = np.array([85, 92, 78, 90, 88])

# Shape: dimensions of the array
print("Shape:", scores.shape)
# Output: (5,)
# This means a 1D array with 5 elements

# Data type: what kind of numbers are stored
print("Data type:", scores.dtype)
# Output: int64 (64-bit integer)

# Size: total number of elements
print("Size:", scores.size)
# Output: 5

# Length (also works with arrays)
print("Length:", len(scores))
# Output: 5

Understanding these properties helps you know what you are working with, especially when debugging or working with real datasets.

Different Ways to Create Arrays

NumPy provides several convenient functions for creating arrays:

# Array from a list of floating-point numbers
temperatures = np.array([25.5, 30.2, 28.8, 31.0, 27.5])
print("Temperatures:", temperatures)

# Array of zeros
zeros = np.zeros(5)
print("Zeros:", zeros)
# Output: [0. 0. 0. 0. 0.]

# Array of ones
ones = np.ones(4)
print("Ones:", ones)
# Output: [1. 1. 1. 1.]

# Range of numbers (similar to Python's range)
numbers = np.arange(0, 10)
print("Range 0-9:", numbers)
# Output: [0 1 2 3 4 5 6 7 8 9]

# Range with step size
evens = np.arange(0, 11, 2)
print("Even numbers:", evens)
# Output: [ 0  2  4  6  8 10]

These functions are useful when you need to initialize arrays with specific patterns before filling them with real data.

Data Types Matter

NumPy arrays can hold different types of numbers:

# Integer array
int_array = np.array([1, 2, 3, 4])
print("Integer array dtype:", int_array.dtype)
# Output: int64

# Float array (has decimal points)
float_array = np.array([1.0, 2.0, 3.0, 4.0])
print("Float array dtype:", float_array.dtype)
# Output: float64

# Mixed types - NumPy converts to the most general type
mixed = np.array([1, 2.5, 3, 4.8])
print("Mixed array:", mixed)
print("Type:", mixed.dtype)
# Output: [1.  2.5 3.  4.8]
# Output: float64

When you mix integers and floats, NumPy automatically converts everything to floats because floats can represent both types of numbers. This automatic type conversion is called type promotion.

Real-World Example

Let’s create an array to hold student exam scores:

# Student exam scores
exam_scores = np.array([85, 92, 78, 88, 95, 72, 89, 91, 83, 87])

print("Exam scores:", exam_scores)
print(f"Number of students: {exam_scores.size}")
print(f"Data type: {exam_scores.dtype}")
print(f"Shape: {exam_scores.shape}")

This simple array already demonstrates NumPy’s power. You can now perform calculations on all scores simultaneously, which you will learn in later lessons.


Accessing Array Elements

Once you have an array, you need to retrieve specific values from it. NumPy provides powerful indexing and slicing capabilities.

Single Element Access

You access elements using square brackets and indices, just like Python lists. Remember that indexing starts at 0:

# Create sample array
data = np.array([10, 20, 30, 40, 50, 60, 70, 80, 90, 100])

# Positive indices (count from the beginning)
print("Element at index 0:", data[0])    # 10
print("Element at index 5:", data[5])    # 60

# Negative indices (count from the end)
print("Last element:", data[-1])         # 100
print("Second to last:", data[-2])       # 90
print("Third from end:", data[-3])       # 80

Negative indexing is particularly useful when you do not know the array length but need to access elements near the end.

Here is a visual representation:

Array:  [10, 20, 30, 40, 50, 60, 70, 80, 90, 100]
Index:   0   1   2   3   4   5   6   7   8    9
Index:  -10 -9  -8  -7  -6  -5  -4  -3  -2   -1

Slicing Arrays

Slicing allows you to extract multiple elements at once. The syntax is array[start:end], where start is included but end is not.

numbers = np.array([0, 10, 20, 30, 40, 50, 60, 70, 80, 90])

# Basic slicing
print("First 3 elements:", numbers[0:3])
# Output: [ 0 10 20]

print("Elements 2-5:", numbers[2:6])
# Output: [20 30 40 50]

print("Last 3 elements:", numbers[-3:])
# Output: [70 80 90]

Visual representation:

numbers[2:6] extracts:

Index:     0   1   2   3   4   5   6   7   8   9
Array:    [0, 10, 20, 30, 40, 50, 60, 70, 80, 90]
                  ^   ^   ^   ^
                  Start at 2, stop before 6
Result: [20, 30, 40, 50]

Slicing Shortcuts

Python provides convenient shortcuts for common slicing patterns:

numbers = np.array([0, 10, 20, 30, 40, 50, 60, 70, 80, 90])

# From start to a specific index
print("From start to index 5:", numbers[:5])
# Output: [ 0 10 20 30 40]

# From a specific index to end
print("From index 6 to end:", numbers[6:])
# Output: [60 70 80 90]

# All elements (creates a copy)
print("All elements:", numbers[:])
# Output: [ 0 10 20 30 40 50 60 70 80 90]

Slicing with Steps

You can also specify a step size to skip elements:

# Syntax: array[start:end:step]
data = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

# Every 2nd element
print("Every 2nd element:", data[::2])
# Output: [0 2 4 6 8]

# Every 3rd element
print("Every 3rd element:", data[::3])
# Output: [0 3 6 9]

# Reverse the array
print("Reversed:", data[::-1])
# Output: [9 8 7 6 5 4 3 2 1 0]

# From index 1 to 8, every 2nd element
print("Index 1 to 8, step 2:", data[1:8:2])
# Output: [1 3 5 7]

The step parameter opens up many possibilities for extracting patterns from your data.

Practical Examples

Let’s apply these concepts to realistic scenarios:

# Monthly sales data (January to December)
sales = np.array([120, 135, 150, 145, 160, 175, 190, 185, 170, 155, 140, 165])

# First quarter (January, February, March)
q1 = sales[0:3]
print("Q1 sales:", q1)
# Output: [120 135 150]

# Last quarter (October, November, December)
q4 = sales[-3:]
print("Q4 sales:", q4)
# Output: [140 155 165]

# Summer months (June, July, August = indices 5, 6, 7)
summer = sales[5:8]
print("Summer sales:", summer)
# Output: [175 190 185]

# Every other month
bimonthly = sales[::2]
print("Bi-monthly:", bimonthly)
# Output: [120 150 160 190 170 140]

Here is another example with temperature data:

# Temperature readings (hourly for 24 hours)
temps = np.array([18, 17, 16, 15, 15, 16, 18, 21, 24, 27, 29, 31,
                  32, 33, 32, 30, 28, 26, 24, 22, 21, 20, 19, 18])

# Morning temperatures (6 AM to 12 PM = indices 6-12)
morning = temps[6:12]
print("Morning temps:", morning)
# Output: [18 21 24 27 29 31]

# Night temperatures (8 PM to midnight = indices 20-24)
night = temps[20:]
print("Night temps:", night)
# Output: [21 20 19 18]

# Every 3 hours
every_3hrs = temps[::3]
print("Every 3 hours:", every_3hrs)
# Output: [18 15 18 27 32 30 24 21]

These examples demonstrate how slicing helps you extract meaningful subsets from your data without writing loops.


Practice Exercises

Now it is your turn to apply what you have learned. Try these exercises on your own before looking at solutions.

Exercise 1: Create and Explore Arrays

Create an array of 10 student test scores. Then print:

  • The array itself
  • Its shape
  • Its data type
  • Its size

Try this on your own:

# Your code here

Hint

Use np.array() with a list of 10 numbers. Then use .shape, .dtype, and .size properties.

Exercise 2: Access Elements

Given this array of prices, extract:

  • The first element
  • The last element
  • The middle element (index 5)
prices = np.array([10.5, 20.3, 15.8, 30.0, 25.5, 18.2, 22.7, 28.1, 19.5, 24.0])

# Your code here

Exercise 3: Slicing Practice

From the array below:

  • Get the first 5 elements
  • Get the last 3 elements
  • Get every 3rd element
  • Reverse the array
data = np.array([5, 10, 15, 20, 25, 30, 35, 40, 45, 50])

# Your code here

Hint

Use slicing syntax: array[start:end:step]. Remember that negative step reverses the array.


Summary

Congratulations! You have completed your first NumPy lesson. Let’s review what you learned.

Key Concepts

NumPy Fundamentals

  • NumPy is Python’s library for numerical computing and data analysis
  • Import with import numpy as np (standard convention)
  • NumPy is 10-100 times faster than pure Python for numerical operations

Vectorization

  • Process multiple data items simultaneously instead of one at a time
  • Eliminates the need for slow loops
  • Uses SIMD (Single Instruction, Multiple Data) for efficiency
  • Essential for efficient data analysis

Creating Arrays

  • np.array([1, 2, 3]) creates an array from a list
  • np.zeros(5) creates an array of zeros
  • np.ones(4) creates an array of ones
  • np.arange(0, 10) creates a range of numbers

Array Properties

  • .shape shows array dimensions
  • .dtype shows the data type (int64, float64, etc.)
  • .size shows the total number of elements
  • len() also works with arrays

Accessing Elements

  • array[0] accesses the first element
  • array[-1] accesses the last element
  • array[2:5] slices elements from index 2 to 4
  • array[::2] selects every 2nd element
  • array[::-1] reverses the array

Why This Matters

These fundamentals form the foundation for everything you will do with data in Python. Whether you are analyzing sales data, processing sensor readings, or preparing data for machine learning, you will use these techniques constantly.

Vectorization is particularly important. As you work with larger datasets, the speed difference between loops and vectorized operations becomes critical. A task that takes 10 minutes with loops might take only 6 seconds with NumPy.


Next Steps

You now understand NumPy basics and can work with one-dimensional arrays. In the next lesson, you will learn about two-dimensional arrays and how to load real data from CSV files.

Continue to Lesson 2 - 2D Arrays and CSV Data

Learn to work with two-dimensional arrays and load real datasets from CSV files

Back to Module Overview

Return to the NumPy Fundamentals module overview


Continue Building Your Skills

You have taken your first step into the world of efficient numerical computing. The skills you learned here—creating arrays, understanding vectorization, and accessing data—will serve you throughout your entire data analytics career.

Keep practicing these concepts. The more comfortable you become with NumPy arrays, the more powerful your data analysis will be!