Lesson 6 - Guided Project: Predicting IPO Listing Gains with PyTorch
Welcome to Your First End-to-End PyTorch Project
This lesson is a guided project that pulls together every skill from this module. Instead of learning one new idea, you will apply many ideas at once: loading real data, engineering a target, building an MLP, training it, and evaluating it honestly. You will work as a data scientist at an investment firm, predicting whether an Indian IPO will list at a gain, and you will learn just as much from where the model struggles as from where it succeeds.
By the end of this lesson, you will be able to:
- Load and explore a real CSV dataset and engineer a binary classification target from a continuous column
- Split data into train and test sets with stratification and scale features without leakage
- Convert pandas data into PyTorch tensors, datasets, and a training loop
- Build, train, and evaluate a multilayer perceptron for binary classification in PyTorch
- Read a confusion matrix and AUC critically, and explain why deep learning does not always beat a simple baseline
This is the capstone of the PyTorch module. You should be comfortable with tensors, nn.Sequential, the training loop, and basic evaluation from the earlier lessons. Let’s build something real.
The Business Problem
When a company goes public, it sets an issue price for its shares. On the first day of trading, the market decides what those shares are actually worth. If demand is strong, the opening price jumps above the issue price, producing a listing gain. If demand is weak, the price stays flat or falls.
Your investment firm does not care about the exact percentage. They care about one decision: should we recommend this IPO or not? That makes this a binary classification problem. Given what you know about an IPO before it lists, will it close its first day above the issue price (1) or not (0)?
It helps to think about why this is genuinely hard. A restaurant’s opening-day popularity depends on location, marketing, and competition, but also on weather, timing, and luck. IPO outcomes are similar: they mix measurable signals (how heavily investors subscribed) with noise no dataset can capture (overall market mood that day). Keep this in mind, because it will shape how you read your results at the end.
Educational use only
This project uses financial data to practice the deep learning workflow. The model you build here is for learning, not for trading. Never use a model like this for real investment decisions without proper financial expertise, risk controls, and regulatory review.
Loading the Data
The dataset contains historical information on IPOs in the Indian market. Unlike the toy datasets bundled with libraries, this is a real CSV file you load with pandas, but the analysis from there is identical.
import numpy as np
import pandas as pd
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import TensorDataset, DataLoader
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import confusion_matrix, roc_auc_score
# download: https://datatweets.com/datasets/indian_ipo.csv
df = pd.read_csv("indian_ipo.csv")
print("Shape:", df.shape)
# Output: Shape: (319, 10)You have 319 IPOs and 10 columns. That row count is worth pausing on: 319 is a small dataset by deep learning standards. Deep networks usually shine when they have tens of thousands of examples. Remember this number, because it foreshadows the honest conversation at the end of the lesson.
A Data Dictionary
Here are the columns you will work with:
| Column | Type | Meaning |
|---|---|---|
Date | date | Date the IPO was listed |
IPOName | text | Name of the company |
Issue_Size | float | Size of the issue, in INR Crores |
Subscription_QIB | float | Times oversubscribed by Qualified Institutional Buyers |
Subscription_HNI | float | Times oversubscribed by High Net-worth Individuals |
Subscription_RII | float | Times oversubscribed by Retail Individual Investors |
Subscription_Total | float | Total times the issue was subscribed |
Issue_Price | float | Price in INR at which the IPO was issued |
Listing_Gains_Percent | float | Percent gain of listing price over issue price |
The last column, Listing_Gains_Percent, is the raw outcome. You will turn it into the binary target your firm actually wants.
Engineering the Target
Your firm’s question is yes-or-no, so you convert the continuous gain percentage into a binary label: 1 if the listing gain is positive (the IPO made money on day one) and 0 otherwise.
# 1 if the IPO listed above its issue price, else 0
df["Listing_Gains_Profit"] = (df["Listing_Gains_Percent"] > 0).astype(int)
print(df["Listing_Gains_Profit"].value_counts())
# Output:
# Listing_Gains_Profit
# 1 174
# 0 145
# Name: count, dtype: int64
gain_rate = df["Listing_Gains_Profit"].mean()
print("gain rate:", round(gain_rate, 3))
# Output: gain rate: 0.545About 54.5 percent of IPOs in this dataset listed at a profit (174 of 319). That balance matters for two reasons. First, the classes are reasonably even, so accuracy is not automatically misleading the way it would be with a 95/5 split. Second, and more importantly, it sets your baseline: a lazy model that always predicts “profit” would be right about 54.5 percent of the time without learning anything. Any model you build has to clear that bar to earn its keep.
You will return to this figure later. For now, just notice that you are aiming to do meaningfully better than 54.5 percent, and keep that target in mind as you build.
The baseline is your yardstick
Before training any model, always compute the accuracy of the simplest possible rule, usually “always predict the majority class.” Here that is 0.545. A model that scores below this number is actively worse than doing nothing, and a model that barely beats it has learned very little. The baseline keeps you honest.
Choosing Features
Not every column belongs in your feature matrix. Date and IPOName are identifiers, not predictors. Listing_Gains_Percent is the very quantity you derived the target from, so including it would leak the answer directly into the model. That leaves the subscription metrics, the issue size, and the issue price as honest predictors.
feature_cols = [
"Issue_Size", "Subscription_QIB", "Subscription_HNI",
"Subscription_RII", "Subscription_Total", "Issue_Price",
]
X = df[feature_cols]
y = df["Listing_Gains_Profit"]
print("X shape:", X.shape)
# Output: X shape: (319, 6)The intuition is that heavily oversubscribed IPOs, especially those institutional investors clamor for, are more likely to pop on day one. Whether that intuition holds up is exactly what the model will test.
Splitting and Scaling
You hold out a test set the model never sees during training, and you stratify so the 54.5 percent gain rate is preserved in both splits.
X_train, X_test, y_train, y_test = train_test_split(
X, y,
test_size=0.25,
random_state=42,
stratify=y,
)
print("Training observations:", X_train.shape[0])
print("Test observations: ", X_test.shape[0])
# Output:
# Training observations: 239
# Test observations: 80Neural networks are sensitive to feature scale, and the columns here live on wildly different ranges (issue prices in the hundreds, subscription multiples in the single or double digits). You standardize each feature to mean 0 and standard deviation 1 using the transform
where and are the feature’s mean and standard deviation learned on the training data only.
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train) # learn stats on TRAIN
X_test_scaled = scaler.transform(X_test) # apply same stats to TESTFit the scaler on training data only
Call fit_transform on the training set and transform (never fit) on the test set. Fitting on all the data lets information about the test set leak into training, and your reported accuracy becomes optimistically wrong. With only 80 test examples, even a small leak can swing the numbers noticeably.
From pandas to PyTorch Tensors
PyTorch trains on tensors, not DataFrames, so you bridge the gap. The features become float32 tensors, and the target becomes a float32 column shaped to match the model’s single output neuron.
X_train_t = torch.tensor(X_train_scaled, dtype=torch.float32)
X_test_t = torch.tensor(X_test_scaled, dtype=torch.float32)
# reshape targets to a column so they line up with the model output
y_train_t = torch.tensor(y_train.values, dtype=torch.float32).view(-1, 1)
y_test_t = torch.tensor(y_test.values, dtype=torch.float32).view(-1, 1)
# wrap training data in a dataset and a shuffling loader
train_ds = TensorDataset(X_train_t, y_train_t)
train_loader = DataLoader(train_ds, batch_size=16, shuffle=True)
print("Batches per epoch:", len(train_loader))
# Output: Batches per epoch: 15With 239 training rows and a batch size of 16, each epoch is 15 mini-batches (the last is partial). Shuffling the training loader each epoch keeps the model from learning the order of the data instead of its patterns. You leave the test data as plain tensors because you will evaluate it in one pass.
Building the MLP
Now design the network. For binary classification you use a multilayer perceptron that funnels from the input width down to a single neuron, with a sigmoid at the end to squash the output into a probability between 0 and 1.
torch.manual_seed(42)
model = nn.Sequential(
nn.Linear(6, 32), # 6 input features -> 32 units
nn.ReLU(),
nn.Linear(32, 16),
nn.ReLU(),
nn.Linear(16, 8),
nn.ReLU(),
nn.Linear(8, 1), # single output neuron
nn.Sigmoid(), # probability of listing gain
)
print(model)
# Output:
# Sequential(
# (0): Linear(in_features=6, out_features=32, bias=True)
# (1): ReLU()
# (2): Linear(in_features=32, out_features=16, bias=True)
# (3): ReLU()
# (4): Linear(in_features=16, out_features=8, bias=True)
# (5): ReLU()
# (6): Linear(in_features=8, out_features=1, bias=True)
# (7): Sigmoid()
# )The funnel shape (32, then 16, then 8) is a common default: wide early layers can mix the raw features in many ways, and narrowing toward the output forces the network to compress what it learns into a compact decision. The ReLU activations give the network the nonlinearity it needs, and the final Sigmoid turns the last neuron’s value into a probability you can threshold at 0.5.
Training the Model
For a sigmoid output you pair nn.BCELoss (binary cross-entropy) with the Adam optimizer. Each epoch follows the same five steps you learned earlier: zero the gradients, run a forward pass, compute the loss, back-propagate, and step the optimizer.
criterion = nn.BCELoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)
num_epochs = 100
for epoch in range(num_epochs):
model.train()
for batch_X, batch_y in train_loader:
optimizer.zero_grad()
outputs = model(batch_X) # forward pass
loss = criterion(outputs, batch_y)
loss.backward() # compute gradients
optimizer.step() # update weights
if (epoch + 1) % 20 == 0:
print(f"Epoch {epoch+1:>3} train loss = {loss.item():.4f}")
# Output:
# Epoch 20 train loss = 0.5189
# Epoch 40 train loss = 0.4471
# Epoch 60 train loss = 0.4012
# Epoch 80 train loss = 0.3611
# Epoch 100 train loss = 0.3382The training loss falls steadily from epoch to epoch and lands at 0.3382 by the end. That looks encouraging: the model is clearly fitting the training data. But a falling training loss only tells you the model is memorizing what it has seen. The real question is whether it learned anything that transfers to IPOs it has never encountered. For that, you go to the test set.
A falling loss is not a finish line
It is easy to celebrate a decreasing training loss, but training loss measures fit on data the model is allowed to study. With a small dataset and a flexible network, the loss can keep dropping while the model quietly overfits. Always confirm performance on held-out data before drawing any conclusions.
Evaluating Honestly
Switch the model to evaluation mode, turn off gradient tracking, and produce predictions for the test set. Probabilities above 0.5 become a predicted “profit” (1).
model.eval()
with torch.no_grad():
probs = model(X_test_t) # predicted probabilities
preds = (probs > 0.5).float() # threshold at 0.5
accuracy = (preds == y_test_t).float().mean().item()
auc = roc_auc_score(y_test_t.numpy(), probs.numpy())
print(f"Test accuracy: {accuracy:.3f}")
print(f"Test AUC: {auc:.3f}")
# Output:
# Test accuracy: 0.562
# Test AUC: 0.618Here is the moment of truth. The model scores 0.562 accuracy on the test set. The majority-class baseline was 0.545. Your deep network beat doing-nothing by less than two percentage points. The AUC of 0.618 tells a similar story: AUC ranges from 0.5 (random guessing) to 1.0 (perfect), so 0.618 means the model has a faint but real ability to rank profitable IPOs above unprofitable ones, yet nothing close to a reliable signal.
The confusion matrix makes the behavior concrete. It cross-tabulates what actually happened against what the model predicted.
cm = confusion_matrix(y_test_t.numpy(), preds.numpy())
tn, fp, fn, tp = cm[0, 0], cm[0, 1], cm[1, 0], cm[1, 1]
print("Confusion matrix:")
print(cm)
print(f"TN={tn} FP={fp} FN={fn} TP={tp}")
# Output:
# Confusion matrix:
# [[16 20]
# [15 29]]
# TN=16 FP=20 FN=15 TP=29
precision = tp / (tp + fp)
recall = tp / (tp + fn)
print(f"Precision: {precision:.3f}")
print(f"Recall: {recall:.3f}")
# Output:
# Precision: 0.592
# Recall: 0.659Read the four cells against the figure you saw earlier. Of the 80 test IPOs, the model correctly flagged 29 winners (true positives) and 16 non-winners (true negatives). But it made 20 false positives, predicting a gain that never came, and 15 false negatives, missing real winners. In business terms, those 20 false positives are the painful ones: each is an IPO the model would have recommended that actually disappointed. Precision of 0.592 means barely six in ten of the model’s “buy” calls panned out, and recall of 0.659 means it caught about two-thirds of the genuine winners.
Why Deep Learning Did Not Win Here
This is the most valuable part of the project, so do not skip past it. A 0.562 model is not a failure of your code; it is an honest result, and understanding why it happened is exactly the skill that separates practitioners from button-pushers.
- The dataset is tiny. 319 rows, of which only 239 are for training, is far too little for a deep network to find robust patterns. Deep learning is data-hungry, and with this few examples the model can memorize the training set long before it learns anything that generalizes.
- The features are weak. Subscription numbers and issue size carry only a faint relationship to first-day price. The strongest single predictor barely correlates with the target, so there simply is not much signal for any model to extract.
- The outcome is noisy. First-day IPO returns depend heavily on the overall market mood that morning, news, and pure sentiment, none of which appear in these columns. No model, deep or otherwise, can predict what the data does not contain.
In short, a problem with little signal and little data does not get rescued by a bigger model. A simple logistic regression on these same features would land in roughly the same place, and that is the point: model power cannot manufacture information that is not there.
When deep learning is the wrong tool
Reach for deep learning when you have lots of data and rich, high-signal features, such as images, audio, or long text. For small tabular datasets with weak features, simpler models are usually just as accurate, far faster to train, and much easier to explain. Choosing the right tool is itself a core skill.
What Would Actually Help
If your firm wanted a better model, more layers would not be the answer. More and better data would. Collecting thousands more IPOs, adding richer features (sector, market conditions on the listing day, broader index movement, anchor-investor details), and engineering smarter inputs would all do more than any architecture change. You could also try adjusting the decision threshold away from 0.5 to trade precision for recall depending on whether your firm fears missed winners or bad recommendations more. The model is a lens; it can only sharpen the signal that already exists in the data.
Practice Exercises
Now it is your turn. These build directly on the project, so work them on the same data you just loaded.
Exercise 1: Compare Against a Simple Baseline
Build the simplest possible model: always predict the majority class. Compute its accuracy on the test set and compare it to the MLP’s 0.562. This is the comparison that puts your deep model in context.
# Your code here (reuse y_train and y_test from the lesson)Hint
Find the majority class with y_train.mode()[0] (it will be 1). Then the baseline accuracy is just the fraction of the test set in that class: (y_test == 1).mean(). You should get about 0.55, only a hair below the MLP, which is the whole lesson in one number.
Exercise 2: Read Precision and Recall From the Matrix
Using the confusion matrix counts (TN=16, FP=20, FN=15, TP=29), compute precision and recall by hand, then explain in one sentence which error type, false positives or false negatives, would cost your investment firm more.
tn, fp, fn, tp = 16, 20, 15, 29
# Your code hereHint
Precision is tp / (tp + fp) and recall is tp / (tp + fn). You should get 0.592 and 0.659. False positives are recommendations that lose money, so a risk-averse firm usually fears those most and would favor higher precision, even at the cost of recall.
Exercise 3: Try a Different Threshold
The default decision threshold is 0.5. Re-threshold the model’s predicted probabilities at 0.6 instead and recompute the confusion matrix. Notice how raising the threshold makes the model more cautious about predicting a gain.
# Your code here (reuse probs and y_test_t from the lesson)Hint
Replace the threshold with preds = (probs > 0.6).float(), then rebuild the matrix with confusion_matrix(y_test_t.numpy(), preds.numpy()). A higher threshold means fewer positive predictions overall, which typically lowers false positives (raising precision) while raising false negatives (lowering recall).
Summary
You built a complete deep learning project from a raw CSV to an honest evaluation, and you learned to read the result critically rather than chase a number. Let’s review.
Key Concepts
The End-to-End Workflow
- Load real data, engineer a target, select honest features, split, scale, build, train, and evaluate
- A continuous column like
Listing_Gains_Percentbecomes a binary target with a simple threshold - Drop identifier columns and any column derived from the target to avoid leakage
PyTorch Mechanics
- Convert scaled pandas data to
float32tensors and reshape targets to a column with.view(-1, 1) - Wrap training data in a
TensorDatasetand a shufflingDataLoader - Build an MLP with
nn.Sequential, funneling to oneSigmoidoutput for binary classification - Train with
nn.BCELossandAdamover the standard five-step loop
Honest Evaluation
- A falling training loss (here 0.3382) only proves the model fit the training data
- Always compare test accuracy (0.562) against the majority-class baseline (0.545)
- AUC (0.618) measures ranking quality, from 0.5 for random to 1.0 for perfect
- A confusion matrix (TN=16, FP=20, FN=15, TP=29) reveals the kind of mistakes a model makes
- Precision (0.592) and recall (0.659) translate those mistakes into business meaning
Why This Matters
The most important lesson here is that deep learning is not magic. With only 319 noisy examples and weak features, your carefully built MLP barely edged out a model that always guesses the majority class. That is not a bug in your code; it is the data telling you the truth. A real practitioner’s instinct is to recognize this quickly, compare against a simple baseline, and conclude that the answer is better data and better features, not a bigger network.
This judgment is what employers actually pay for. Anyone can stack layers until a loss curve goes down. Knowing when a problem is solvable, when a simpler model would do, and how to read precision and recall in business terms is the difference between producing numbers and producing decisions. You now have the full workflow and, just as importantly, the skepticism to use it well.
Next Steps
You have completed the PyTorch module by shipping a full project end to end. The natural next move is to see how the same ideas look in a different framework, then revisit the module map to consolidate what you have learned.
Continue to Deep Learning with TensorFlow
Build the same kind of models in the parallel TensorFlow module and compare the two frameworks.
Back to Module Overview
Return to the Deep Learning with PyTorch module overview.
Keep Building Your Skills
You just did the real job of a data scientist: you took a messy, real-world question, built an honest pipeline, and reported a result you could defend, even when that result was modest. That intellectual honesty, comparing against baselines and refusing to oversell a model, is worth more than any single technique. Carry it into every project you build from here, and the tools you learned in this module will serve you well.