Lesson 5 - Deep Learning with the Keras Functional API
Welcome to the Keras Functional API
So far you have built networks by stacking layers one after another with the Sequential API. That style is clean and readable, and it covers a huge share of real models. But it can only describe a single straight line of layers, input at the top, output at the bottom, nothing branching off in between. The moment you need two inputs, two outputs, a shared layer, or a model that splits and rejoins, the Sequential API runs out of room. This lesson introduces the tool that handles all of those cases: the Keras functional API.
By the end of this lesson, you will be able to:
- Explain the limitations of the Sequential API and why the functional API exists
- Define a standalone
Inputlayer and connect layers by calling each layer on the previous one - Rebuild a binary classifier with the functional API and read its
model.summary() - Build a branching model with two parallel Dense paths joined by a
Concatenatelayer - Decide when to prefer the functional API over the Sequential API
You should already be comfortable building, compiling, and training a Sequential model in Keras, and with the basic data-preparation steps from earlier lessons. Let’s begin.
Two Ways to Describe the Same Network
A neural network is really a graph: layers are nodes, and the arrows between them say which layer’s output feeds into which layer’s input. The Sequential and functional APIs are just two different ways of writing that graph down in code.
The Sequential API assumes the graph is a single unbranched chain. You hand Keras a list of layers, and it wires them top to bottom for you:
import tensorflow as tf
model = tf.keras.Sequential([
tf.keras.layers.Input(shape=(12,)),
tf.keras.layers.Dense(64, activation="relu"),
tf.keras.layers.Dense(1, activation="sigmoid"),
])This is wonderfully compact. The cost of that convenience is rigidity: each layer has exactly one input (the layer before it) and one output (the layer after it). There is no way to express a layer that receives from two places, or sends its output to two places.
The functional API drops that assumption. Instead of a list, you build the graph by hand, one connection at a time, by calling each layer on the tensor it should consume. The result is the same kind of model object, but you control every arrow.
import tensorflow as tf
inputs = tf.keras.Input(shape=(12,))
x = tf.keras.layers.Dense(64, activation="relu")(inputs)
outputs = tf.keras.layers.Dense(1, activation="sigmoid")(x)
model = tf.keras.Model(inputs=inputs, outputs=outputs)Both snippets describe the same network. The difference is purely in how you express the wiring. Once you understand the three things that change, the functional style becomes second nature.
The Three Differences
Every functional model differs from a Sequential one in exactly three places:
- The input is defined on its own. You create a standalone
tf.keras.Input(...)tensor instead of passing the shape to the first hidden layer. - The layers are connected by calling them.
Dense(64, activation="relu")(inputs)means “create a Dense layer, then runinputsthrough it.” The trailing(inputs)is what wires the arrow. - The model is created explicitly. You finish by calling
tf.keras.Model(inputs=..., outputs=...)and naming which tensors are the entry and exit points.
Keep these three ideas in mind and you can translate any Sequential model into functional form, and, far more usefully, build models the Sequential API simply cannot.
A layer is callable
The piece that surprises most people is the double set of parentheses: Dense(64)(inputs). The first pair constructs a layer object. The second pair calls that object on a tensor, returning a new tensor. Reading it as two steps, “make the layer, then apply it,” makes the syntax click.
The Problem We’ll Model
To make this concrete, you will reuse the prediction task from earlier in this module: given the financial details of a company’s initial public offering (IPO), predict whether the stock gained on its first day of trading. This is a binary classification problem, exactly the kind the functional API handles well.
You will work with the real Indian IPO dataset, a record of companies that listed on Indian stock exchanges along with the offering’s financial characteristics and whether the stock closed its first day above its issue price.
import pandas as pd
# download: https://datatweets.com/datasets/indian_ipo.csv
df = pd.read_csv("indian_ipo.csv")
print("Shape:", df.shape)
# Output: Shape: (319, 10)The dataset has 319 rows and 10 columns. Each row is one IPO. The final column is the target: 1 if the stock gained on listing day, 0 if it did not.
print(df["listing_gain"].value_counts())
# Output:
# listing_gain
# 1 174
# 0 145
# Name: count, dtype: int64
print("gain rate:", round(df["listing_gain"].mean(), 3))
# Output: gain rate: 0.545About 54.5 percent of these IPOs gained on their first day (174 out of 319). That is a reasonably balanced target, which means accuracy will be a fair first measure of performance.
Preparing the Data
The preparation mirrors what you have done before, so we move quickly. You separate the predictors from the target, scale every predictor into the [0, 1] range so no single feature dominates, and split off a test set you never train on.
import numpy as np
from sklearn.model_selection import train_test_split
target = ["listing_gain"]
predictors = [c for c in df.columns if c not in target]
# Scale every predictor into [0, 1]
df[predictors] = df[predictors] / df[predictors].max()
X = df[predictors].astype(np.float32).values
y = df[target].astype(np.float32).values
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.20, random_state=100
)
print("Train features:", X_train.shape)
print("Test features: ", X_test.shape)
# Output:
# Train features: (255, 9)
# Test features: (64, 9)You now have nine numeric predictors, a scaled training set, and a held-out test set. Everything that follows builds on these arrays.
Building the Classifier with the Functional API
Now you rebuild the IPO classifier the functional way. Read this section slowly: each line corresponds to exactly one node in the network graph, and the order in which you write them is the order data flows through them.
Step 1: The Standalone Input Layer
A functional model begins with an Input tensor that declares the shape of one observation. You pass the shape as a tuple, not a bare integer, and you leave out the batch dimension because Keras adds it automatically.
import tensorflow as tf
inputs = tf.keras.Input(shape=(X_train.shape[1],))X_train.shape[1] is 9, so this declares an input that accepts nine features per row. Unlike the Sequential API, where the input shape rides along on the first hidden layer, here it is its own explicit object that the rest of the graph hangs off of.
Step 2: Connect Hidden Layers
Each hidden layer is created and immediately called on the tensor that should feed it. The tensor in the trailing parentheses is the incoming connection.
x = tf.keras.layers.Dense(64, activation="relu")(inputs)
x = tf.keras.layers.Dropout(rate=0.3)(x)
x = tf.keras.layers.Dense(32, activation="relu")(x)
x = tf.keras.layers.Dropout(rate=0.2)(x)
x = tf.keras.layers.Dense(16, activation="relu")(x)Reusing the name x is a common convention: each line overwrites x with the tensor coming out of the new layer, so the next line picks up exactly where the last left off. You could give every tensor a distinct name (hidden1, hidden2, and so on); the threaded-x style just keeps the graph readable when it is a straight chain.
The Dropout layers are a quick aside worth naming. Dropout is a regularization technique that randomly zeroes a fraction of a layer’s outputs during training. By forcing the network not to rely on any single neuron, it makes the model less likely to overfit the training data. A rate=0.3 means roughly 30 percent of the units are dropped on each training step; at prediction time dropout turns itself off automatically.
Step 3: The Output Layer
Because this is binary classification, the final layer is a single neuron with a sigmoid activation, which squashes any real number into a probability between 0 and 1.
outputs = tf.keras.layers.Dense(1, activation="sigmoid")(x)Step 4: Instantiate the Model
The graph is fully wired. You turn it into a trainable model by naming its entry and exit tensors.
model = tf.keras.Model(inputs=inputs, outputs=outputs)
model.summary()
# Output:
# Model: "functional"
# _________________________________________________________________
# Layer (type) Output Shape Param #
# =================================================================
# input_layer (InputLayer) [(None, 9)] 0
# dense (Dense) (None, 64) 640
# dropout (Dropout) (None, 64) 0
# dense_1 (Dense) (None, 32) 2080
# dropout_1 (Dropout) (None, 32) 0
# dense_2 (Dense) (None, 16) 528
# dense_3 (Dense) (None, 1) 17
# =================================================================
# Total params: 3,265
# Trainable params: 3,265
# Non-trainable params: 0
# _________________________________________________________________The summary reads top to bottom in data-flow order. The None in every output shape is the batch dimension, left flexible so the model accepts any number of rows at once. Dropout layers have zero parameters because they only mask values; they learn nothing.
Step 5: Compile, Train, and Evaluate
From here, training a functional model is identical to training a Sequential one. You compile with an optimizer, a loss, and a metric, then fit and evaluate.
model.compile(
optimizer=tf.keras.optimizers.Adam(0.001),
loss=tf.keras.losses.BinaryCrossentropy(),
metrics=["accuracy"],
)
history = model.fit(X_train, y_train, epochs=150, verbose=0)
print("final train loss:", round(history.history["loss"][-1], 4))
# Output: final train loss: 0.2529
test_loss, test_acc = model.evaluate(X_test, y_test, verbose=0)
print(f"test accuracy: {test_acc:.3f}")
# Output: test accuracy: 0.537The training curve below shows the loss falling steadily across epochs as the optimizer tunes the weights.
A test accuracy near 0.537, with a final training loss of about 0.2529, tells an honest story: the model fits the training data well but barely beats guessing on unseen IPOs. That gap is a textbook hint of overfitting, predicting a chaotic, news-driven outcome like first-day stock movement from a few financial features is genuinely hard. The point here is not a record-breaking score; it is that you built and trained this model entirely with the functional API. The numbers will vary slightly on your machine because weight initialization and dropout are random.
Watch the train/test gap
When training performance is much stronger than test performance, the model has memorized patterns that do not generalize. Dropout is one defense, and you saw it wired into this network. If the gap stays wide, more regularization, more data, or a simpler model are the usual next moves. Always evaluate on data the model never saw during training.
Where the Functional API Earns Its Keep: Branching
Everything so far could have been written with the Sequential API, the chain never branched. The real payoff arrives when the graph stops being a straight line. The functional API lets a single input feed two parallel paths that each transform the data differently, then merges them back together before the output.
The diagram below shows the shape you are about to build: one input that splits into two independent Dense branches, which are then concatenated and sent through a final output layer.
Here is the same structure in code. Notice that both branches call the same inputs tensor, which is exactly what the Sequential API cannot express.
inputs = tf.keras.Input(shape=(X_train.shape[1],))
# Branch A: a wider, shallow path
branch_a = tf.keras.layers.Dense(32, activation="relu")(inputs)
# Branch B: a narrower path that looks at the input differently
branch_b = tf.keras.layers.Dense(16, activation="relu")(inputs)
# Merge the two branches into a single tensor
merged = tf.keras.layers.Concatenate()([branch_a, branch_b])
outputs = tf.keras.layers.Dense(1, activation="sigmoid")(merged)
branching_model = tf.keras.Model(inputs=inputs, outputs=outputs)
branching_model.summary()
# Output:
# Model: "functional_1"
# __________________________________________________________________________________________________
# Layer (type) Output Shape Param # Connected to
# ==================================================================================================
# input_layer_1 [(None, 9)] 0 []
# dense_4 (Dense) (None, 32) 320 ['input_layer_1[0][0]']
# dense_5 (Dense) (None, 16) 160 ['input_layer_1[0][0]']
# concatenate (None, 48) 0 ['dense_4[0][0]', 'dense_5[0][0]']
# dense_6 (Dense) (None, 1) 49 ['concatenate[0][0]']
# ==================================================================================================
# Total params: 529
# Trainable params: 529
# Non-trainable params: 0
# __________________________________________________________________________________________________Read the Connected to column, which only appears for non-linear graphs. Both dense_4 and dense_5 connect to the same input_layer_1, confirming the split. The Concatenate layer then lists two inputs and produces a width-48 tensor (32 + 16), which the final Dense consumes. That branch-and-merge pattern is the functional API’s signature move.
branching_model.compile(
optimizer=tf.keras.optimizers.Adam(0.001),
loss=tf.keras.losses.BinaryCrossentropy(),
metrics=["accuracy"],
)
branching_model.fit(X_train, y_train, epochs=150, verbose=0)
# trains exactly like any other Keras modelOnce built, the branching model compiles, fits, and evaluates with the same calls as before. The graph got more interesting; the training loop did not change at all.
Concatenate vs. Add
Concatenate stacks two tensors side by side, so a width-32 and a width-16 branch become width-48 and every feature survives. Add instead sums them element-wise, which requires the branches to have the same width and blends them rather than preserving both. Concatenation keeps more information; addition is common in residual connections. Pick based on whether you want to combine or to preserve.
When to Use Which API
Neither API is “better.” They solve different shapes of problem, and the functional API is a strict superset, anything Sequential can do, functional can do too, just more verbosely.
Reach for Sequential when... Reach for Functional when...
----------------------------- -------------------------------
The model is one straight chain The model branches or merges
Input -> layers -> output Multiple inputs or outputs
You want the most compact code Layers are shared / reused
Quick prototypes & tutorials Residual or skip connectionsIn practice the guideline is simple: start with Sequential, switch to functional the moment your graph stops being a straight line. If you find yourself wishing a layer could read from two places, send its output to two places, or be reused, that wish is the signal to move to the functional API.
A few concrete cases that demand the functional style:
- Multiple inputs, such as a model that takes both an image and a table of metadata, processing each with its own sub-network before merging.
- Multiple outputs, such as predicting both a category and a price from the same features.
- Shared layers, where one layer is applied to two different inputs so they are embedded into the same space.
- Branching and skip connections, like the parallel-path model you just built, or the residual blocks at the heart of modern architectures.
Practice Exercises
Try these before checking the hints. Reuse X_train, X_test, y_train, and y_test from the lesson.
Exercise 1: Translate a Sequential Model
Take this Sequential model and rewrite it using the functional API so it describes the exact same network. Build it, then call .summary() to confirm the layers match.
seq = tf.keras.Sequential([
tf.keras.layers.Input(shape=(X_train.shape[1],)),
tf.keras.layers.Dense(32, activation="relu"),
tf.keras.layers.Dense(1, activation="sigmoid"),
])
# Your code here: rebuild this with the functional APIHint
Create inputs = tf.keras.Input(shape=(X_train.shape[1],)), then x = tf.keras.layers.Dense(32, activation="relu")(inputs), then outputs = tf.keras.layers.Dense(1, activation="sigmoid")(x), and finish with tf.keras.Model(inputs=inputs, outputs=outputs). The parameter counts in .summary() should be identical to the Sequential version.
Exercise 2: Add a Third Branch
Extend the branching model from the lesson with a third parallel Dense branch of 8 units that also reads from inputs, and concatenate all three branches before the output layer.
inputs = tf.keras.Input(shape=(X_train.shape[1],))
branch_a = tf.keras.layers.Dense(32, activation="relu")(inputs)
branch_b = tf.keras.layers.Dense(16, activation="relu")(inputs)
# Your code here: add branch_c, concatenate all three, add the outputHint
Create branch_c = tf.keras.layers.Dense(8, activation="relu")(inputs), then pass a list of all three to the merge: tf.keras.layers.Concatenate()([branch_a, branch_b, branch_c]). The concatenated tensor will have width 32 + 16 + 8 = 56, which you can confirm in the summary.
Exercise 3: Compare Concatenate with Add
Build a branching model whose two branches each output 16 units, then merge them with tf.keras.layers.Add() instead of Concatenate(). Check the merged layer’s output shape in the summary and explain why Add would fail if the branches had different widths.
inputs = tf.keras.Input(shape=(X_train.shape[1],))
# Your code here: two width-16 branches, merged with Add()Hint
Give both branches Dense(16, activation="relu")(inputs), then merge with tf.keras.layers.Add()([branch_a, branch_b]). The merged shape stays (None, 16) because Add sums element-wise. If the branches were different widths there would be no element-to-element correspondence, so the shapes would not align and Keras would raise an error.
Summary
You now know both ways to describe a Keras model and, more importantly, when each one is the right tool. Let’s review.
Key Concepts
Why the Functional API Exists
- The Sequential API can only describe a single unbranched chain: one input, one output, one layer feeding the next
- The functional API treats a model as a graph you wire by hand, so it can express branches, merges, multiple inputs, multiple outputs, and shared layers
- Anything Sequential can build, functional can build too; it is a strict superset
The Three Differences
- Define a standalone
tf.keras.Input(shape=(...,))tensor instead of putting the shape on the first hidden layer - Connect layers by calling them:
Dense(64, activation="relu")(prev_tensor)returns the next tensor - Finish with
tf.keras.Model(inputs=..., outputs=...)to name the entry and exit points
Building and Training
- The classifier used scaled predictors, dropout for regularization, and a sigmoid output for binary classification
model.summary()lists layers in data-flow order;Noneis the flexible batch dimension and dropout layers have zero parameters- Compiling, fitting, and evaluating a functional model use the exact same calls as a Sequential model
Branching
- A single
inputstensor can feed two or more parallel Dense branches Concatenatestacks branches side by side (widths add);Addsums them element-wise (widths must match)- The
Connected tocolumn in the summary reveals the graph’s true wiring
Why This Matters
The functional API is the gateway from textbook networks to the architectures used in practice. Real models rarely stay on a single straight line: recommendation systems fuse user and item towers, multimodal models combine text and images, and the residual connections inside nearly every modern deep network are branches that split and rejoin. Each of those is a graph, and the functional API is how you write graphs in Keras.
Just as important, you saw that switching APIs does not change the rest of the workflow. The data preparation, the compile step, the training loop, and the evaluation are identical whether the model is Sequential or functional. Learn the wiring once and the only thing that changes between a simple chain and a complex branching network is the few lines that describe the graph itself.
Next Steps
You can now build any Keras model, straight chain or branching graph. In the next lesson you will put the full TensorFlow workflow to work end to end in a guided project on this same IPO problem.
Continue to Lesson 6 - Guided Project: Predicting IPO Listing Gains with TensorFlow
Apply everything from this module to build, train, and evaluate a complete IPO classifier.
Back to Module Overview
Return to the Deep Learning with TensorFlow module overview.
Keep Building Your Skills
You have unlocked the most flexible way to design neural networks in Keras. The next time you sketch a model and the arrows refuse to form a straight line, you will know exactly which API to reach for. Keep the habit of starting simple with Sequential and graduating to the functional API only when the graph demands it. That instinct, matching the tool to the shape of the problem, is what separates someone who follows tutorials from someone who designs models of their own.