Lesson 1 - Introduction to Recurrent Neural Networks

Welcome to Sequence Models

This lesson opens the Sequence Models module. You will learn what makes data sequential, why the ordinary feedforward networks you have seen so far cannot model order, and how recurrent neural networks (RNNs) solve the problem by carrying a hidden state, a kind of memory, from one time step to the next. Along the way you will meet the dataset that runs through this entire module: the monthly closing level of the S&P 500 stock index from 1950 to today.

By the end of this lesson, you will be able to:

  • Explain what sequence data is and give examples across text, audio, and time series
  • Describe why a plain feedforward network cannot capture order or context in a sequence
  • Explain how an RNN adds a hidden state that carries memory across time steps
  • Read the RNN recurrence equation and interpret what each term contributes
  • Load and explore the S&P 500 monthly dataset with pandas and TensorFlow

You should be comfortable with Python, pandas, and the basic idea of a neural network with layers, weights, and an activation function. You do not need any prior RNN experience. Let’s begin.


What Makes Data a Sequence?

Most of the datasets you have worked with so far were tables where each row stood on its own. The order of the rows did not matter: shuffle them and the meaning is unchanged. A customer’s age and income tell you about that customer regardless of who came before or after them in the file.

Sequence data is different. In a sequence, the order of the elements is part of the information. Rearranging the elements destroys or changes the meaning. Consider these examples:

  • Text. “The dog bit the man” and “The man bit the dog” use exactly the same words, but the order flips the meaning entirely.
  • Audio. A spoken word is a stream of sound samples; play them backward and the word is gone.
  • Time series. A stock index, a city’s daily temperature, or a patient’s heartbeat is a list of measurements indexed by time. What happened last month shapes what is plausible this month.

There is a useful distinction hiding inside these examples. Sequential data is any data where elements have a defined order. Temporal data is the special case where that order is time: each element carries a timestamp. All temporal data is sequential, but not all sequential data is temporal. Written text is sequential, yet a sentence has no clock attached to it. The S&P 500 index you will work with in this module is both sequential and temporal, because each value is tied to a specific month.

Order is the whole point

If you can shuffle your rows without losing meaning, you do not have sequence data, and an ordinary model is fine. The moment order carries information, you need a model that can remember what it has already seen. That is exactly what an RNN is built to do.


Why Feedforward Networks Fall Short

You have already met the feedforward network: an input layer, one or more hidden layers, and an output layer, with information flowing in a single direction from input to output. Each hidden layer takes its input only from the layer before it and passes its output only to the layer after it. Nothing flows sideways or backward.

A feedforward network treats every input as a fixed, self-contained bundle of features. That works beautifully when each example is independent. It breaks down on sequences for two related reasons.

It has no memory. When the network processes one element of a sequence, it has no record of the elements that came before. Predicting next month’s index from this month’s value alone throws away everything the trend was telling you.

It expects a fixed-size input. A feedforward network is wired for a fixed number of input features. But sequences come in different lengths: sentences have different numbers of words, audio clips run for different durations. There is no clean way to feed a variable-length stream into a layer expecting exactly n n inputs.

You could try to work around this by gluing several time steps together into one long input vector, say the last twelve monthly values side by side. That is a real technique, and you will use a version of it later in the module. But it is a blunt instrument: the network sees twelve numbers with no built-in notion that they are ordered in time, and the trick fails the moment a sequence is longer than the window you chose. What you really want is a model whose architecture understands sequence and order natively.


The Recurrent Idea: A Hidden State That Remembers

A recurrent neural network makes one deceptively small change to the feedforward design, and that change unlocks everything. In a feedforward layer, the output goes only forward to the next layer. In a recurrent layer, the output is also fed back into the layer itself on the next step through the sequence.

That feedback loop creates a hidden state: a vector the layer carries from one time step to the next. At every step the layer looks at two things, the new input for that step and the hidden state left over from the previous step, and produces an updated hidden state. The hidden state is the network’s memory. It is how information from early in the sequence can still influence a decision made much later.

It helps to picture the loop “unrolled” across time. Instead of drawing one box with an arrow curling back on itself, you draw one copy of the layer per time step and pass the hidden state along the chain. The same weights are reused at every step; only the inputs and the hidden state change.

A recurrent layer unrolled across time steps, with the hidden state passed from one step to the next carrying memory forward
An RNN unrolled across time steps: the hidden state flows left to right, carrying memory forward through the sequence.

The Recurrence Equation

The whole mechanism fits in one line of math. At time step t t , the new hidden state ht h_t is computed from the current input xt x_t and the previous hidden state ht1 h_{t-1} :

ht=tanh ⁣(Wxxt+Whht1+b) h_t = \tanh\!\left(W_x\, x_t + W_h\, h_{t-1} + b\right)

Read it left to right. The term Wxxt W_x\, x_t is the contribution of the new information arriving at this step. The term Whht1 W_h\, h_{t-1} is the contribution of everything the network remembers from earlier in the sequence. They are added together, shifted by a bias b b , and squeezed through an activation function (here the hyperbolic tangent, tanh \tanh ) to produce the updated memory ht h_t .

Two things are worth pausing on. First, the hidden state at one step becomes an input to the next step, which is the recurrence. Second, the weight matrices Wx W_x and Wh W_h are shared across every time step. The network does not learn a separate rule for “month 3” versus “month 50”; it learns a single rule for “given what I just saw and what I remember, update my memory,” and applies it everywhere. That sharing is why an RNN can handle sequences of any length with a fixed number of parameters.

One layer, many steps

When you see an RNN “unrolled” into a long chain of boxes, remember it is still one layer with one set of weights. The chain is just a way to visualize the same layer being applied once per time step. This is exactly why an RNN can read a 10-word sentence and a 100-word sentence with the same parameters.


RNNs in the Wild

Because an RNN carries memory, it shines on exactly the data that defeats a feedforward network. A few representative applications:

  • Time series forecasting. Predicting sales, weather, energy demand, or, as in this module, a stock index from its own history.
  • Text analysis and generation. Tagging parts of speech, classifying sentiment, autocomplete, and machine translation all depend on reading words in order.
  • Audio and speech. Transcribing speech or recognizing a spoken command means processing sound as an ordered stream.

The SimpleRNN layer you will build in the next lesson is the most basic recurrent layer, and it is the one that follows the equation above almost literally. Keras also offers two more powerful recurrent layers, the GRU (Gated Recurrent Unit) and the LSTM (Long Short-Term Memory). They use the same carry-the-state idea but add internal “gates” that help them remember information over much longer spans. You will meet those in a later lesson; for now it is enough to know they exist and that they are drop-in replacements for SimpleRNN.

RNNs are not the only specialized network

Where an RNN is built for sequences, a convolutional neural network (CNN) is built for grid-like data such as images, sliding small filters across the input to detect local patterns. The two are not rivals: a later lesson combines convolutional and recurrent layers to process data that is both sequential and high-dimensional, like video. For this module, sequences are our focus.


Meet the Dataset: S&P 500 Monthly

Every lesson in this module returns to one real time series, so it is worth getting to know it now. The S&P 500 is a stock-market index that tracks 500 large U.S. companies; its level is a common shorthand for “how the U.S. stock market is doing.” The file sp500_monthly.csv holds the index level at monthly resolution, stretching from 1950 all the way to the present.

This is a textbook example of temporal sequence data: a single number measured once a month, where each value is tied to a date and the order is everything. It is the perfect playground for recurrent models.

Download the file and load it with pandas. You will use TensorFlow and Keras throughout the module, so import them now as well.

import pandas as pd
import numpy as np
import tensorflow as tf
from tensorflow.keras import layers

# download: https://datatweets.com/datasets/sp500_monthly.csv
df = pd.read_csv("sp500_monthly.csv", parse_dates=["date"])

print("Shape:", df.shape)
print(df.head())
# Output:
# Shape: (917, 2)
#         date  price
# 0 1950-01-01  16.92
# 1 1950-02-01  16.99
# 2 1950-03-01  17.30
# 3 1950-04-01  17.96
# 4 1950-05-01  18.78

The dataset has two columns, date and price, and one row per month. Let’s confirm the span and the range of values.

print("From", df["date"].min().date(), "to", df["date"].max().date())
print(f"Price range: {df['price'].min():.1f} to {df['price'].max():.1f}")
# Output:
# From 1950-01-01 to 2026-05-01
# Price range: 16.9 to 7412.6

So the series runs from January 1950 to May 2026, and the index climbed from under 17 points in 1950 to over 7,400 points at its peak. That is a more than 400-fold increase over the period.

Why the Numbers Span Such a Huge Range

A value that grows from 17 to 7,400 poses a practical problem for any model: the early decades are crammed into a sliver near zero while recent years dominate the scale. Plotting the raw level on a normal axis would make the 1950s look almost flat. The standard fix for data that grows by multiples is to view it on a logarithmic scale, where equal percentage changes take equal vertical space. On a log axis, the steady long-run growth of the market shows up as a roughly straight upward line, with recessions visible as dips.

The S&P 500 monthly index level from 1950 to present plotted on a logarithmic scale, showing steady long-term growth with periodic dips
The S&P 500 monthly index from 1950 to present on a log scale: decades of growth punctuated by sharp downturns.

You can reproduce the log view with a couple of lines. (The plot will display on screen; the exact pixels are not something to memorize.)

import matplotlib.pyplot as plt

plt.figure(figsize=(9, 4))
plt.plot(df["date"], df["price"])
plt.yscale("log")            # equal percentage moves take equal vertical space
plt.title("S&P 500 monthly index (log scale)")
plt.xlabel("year")
plt.ylabel("index level")
plt.show()

A Glimpse of How the Data Becomes Sequences

You will not train a model in this lesson, but it is worth seeing the shape recurrent layers expect, because it directly mirrors the recurrence equation. A Keras recurrent layer wants its input as (num_timesteps, num_features) per example: a window of consecutive time steps, each carrying some number of features.

For this univariate series there is a single feature per step, the index level, and a natural window is twelve months, one year of history used to predict what comes next. The full module turns the raw series into batches of overlapping 12-month windows shaped (batch, 12, 1). You will see the windowing code in detail in the next lesson; for now, just connect the shape to the picture: each window is one walk along the unrolled RNN, twelve steps long, with the hidden state carried from the first month to the last.

prices = df["price"].to_numpy(dtype="float32")
window = 12   # one year of monthly history per example
print("Total months:", prices.shape[0])
print("Per-example input shape a recurrent layer expects:", (window, 1))
# Output:
# Total months: 917
# Per-example input shape a recurrent layer expects: (12, 1)

That (12, 1) is the same input_shape=(num_timesteps, num_features) you will pass to your first recurrent layer in Lesson 2.


Practice Exercises

Try these before checking the hints. They use only what this lesson covered: loading and exploring the series, and reasoning about sequences.

Exercise 1: Sequential or Not?

For each item below, decide whether it is sequence data, and if so, whether it is also temporal: (a) a list of daily closing prices for a stock, (b) a table of customers with age and income, (c) the characters in the word “machine”, (d) hourly temperature readings for a city.

Hint

Ask two questions in order. First, does reordering the elements change the meaning? If yes, it is sequential. Second, is the ordering specifically time, with a timestamp on each element? If yes, it is also temporal. The customer table (b) is neither; the word “machine” (c) is sequential but not temporal; (a) and (d) are both sequential and temporal.

Exercise 2: Resample the Series to Yearly

The dataset is monthly. Compute the yearly average index level and print the first five years, so you can see the long-run climb without monthly noise.

import pandas as pd
df = pd.read_csv("sp500_monthly.csv", parse_dates=["date"])

# Your code here

Hint

Add a year column with df["year"] = df["date"].dt.year, then group by it: df.groupby("year")["price"].mean().head(). Because the series starts at about 17 in 1950, the first yearly averages should be in the high teens to low twenties.

Exercise 3: Measure the Monthly Growth Rate

Compute the month-over-month percentage change of the price column and print its mean. This is the kind of relative change a log scale makes visually uniform.

import pandas as pd
df = pd.read_csv("sp500_monthly.csv", parse_dates=["date"])

# Your code here

Hint

Use df["price"].pct_change() to get the fractional change from each month to the next, then take .mean(). The first value will be NaN because there is no prior month to compare against, which is expected; the mean ignores it. You should get a small positive average, reflecting the market’s long-run upward drift.


Summary

You now have the conceptual foundation for everything that follows in this module. Let’s review.

Key Concepts

Sequence Data

  • In sequence data the order of elements carries meaning; shuffling it changes or destroys the information
  • Sequential data has a defined order; temporal data is the special case where that order is time
  • Text, audio, and time series are the classic examples; the S&P 500 monthly index is both sequential and temporal

Why Feedforward Networks Fail

  • A feedforward network has no memory of earlier elements and expects a fixed-size input
  • Gluing time steps into one vector is a blunt workaround that ignores order and breaks on variable-length sequences

The Recurrent Idea

  • An RNN feeds a layer’s output back into itself, creating a hidden state that carries memory across time steps
  • The recurrence ht=tanh(Wxxt+Whht1+b) h_t = \tanh(W_x x_t + W_h h_{t-1} + b) blends new input with remembered context
  • The weights are shared across all time steps, so one RNN handles sequences of any length with fixed parameters

The Dataset

  • sp500_monthly.csv is the monthly S&P 500 index from 1950 to 2026, 917 rows of date and price
  • Values span roughly 17 to 7,400, so a log scale is the natural way to view its multiplicative growth
  • Recurrent layers expect input shaped (num_timesteps, num_features); here a 12-month window gives (12, 1)

Why This Matters

Almost every interesting real-world signal is a sequence: language, speech, sensor streams, markets, biological data. The feedforward tools you already know simply cannot see order, and order is where the information lives. The recurrent idea, carrying a hidden state forward in time, is the first and most important step toward models that understand sequences, and it is the seed from which GRUs, LSTMs, and even modern attention-based models grew. Internalize the recurrence equation and the unrolled picture now, and the code in the coming lessons will read as a direct translation of ideas you already understand.


Next Steps

You understand why recurrent networks exist and what the hidden state does. Next, you will translate that intuition into working Keras code, building your first SimpleRNN and feeding it real windows of the S&P 500 series.

Continue to Lesson 2 - Basic RNN Architecture

Build your first SimpleRNN in Keras and learn how the recurrence equation becomes a working layer.

Back to Module Overview

Return to the Sequence Models module overview.


Keep Building Your Skills

You have taken the first step into sequence modeling: you can now spot when data is a sequence, explain why ordinary networks cannot handle it, and describe how a hidden state carries memory through time. Keep the unrolled RNN picture in mind as you move forward. Every recurrent layer you build from here, simple RNNs, GRUs, and LSTMs, is a variation on the same core idea you learned today: read one step, update your memory, and pass it on.