Binary Classification of Histopathologic Cancer Images using ResNet18¶

1. Introduction¶

In this project, we approached the binary classification problem of histopathologic scans for detecting metastatic cancer in lymph node sections using the PCam dataset. The dataset consists of image patches labeled as either containing tumor tissue (label 1) or not (label 0). Our goal was to develop a convolutional neural network (CNN) model that classifies unseen test images accurately and produces a Kaggle competition submission.

2. Dataset Description¶

Training images: 220,025 RGB .tif files, each of size 96x96, with corresponding labels provided in train_labels.csv.

Test images: 57,456 unlabeled .tif files.

Labels: Binary classification — 0 (no tumor), 1 (tumor).

Train/Validation split: We used an 80/20 stratified split on the training set.

Downloaded here on kaggle: https://www.kaggle.com/competitions/histopathologic-cancer-detection/data

3. Data Preparation & EDA¶

We created a custom PyTorch Dataset class (PCamDataset) that loads the image file from the disk and maps it to its corresponding label using the CSV. A separate TestPCamDataset was used for the unlabeled test images.

Transformations We applied minimal augmentation to speed up training.

For EDA, we looked at the label distribution and a couple of sample pictures.

4. Model Architecture¶

We created our own set of CNN models, a simple version and a smarter version, but in the end went with ResNet18 pre trained model for a better score.

We used a pretrained ResNet18 model from torchvision.models, replacing the final fully connected layer to match our binary output:

This transfer learning approach allows us to leverage ImageNet feature extraction capabilities while fine-tuning only the final classifier for tumor detection.

5. Training Setup¶

Loss function: CrossEntropyLoss

Optimizer: Adam with learning rate 0.0005

Epochs: 10

Device: CUDA if available, else CPU

Batch size: 64

No early stopping or LR scheduling used

6. Evaluation Metrics¶

Accuracy, Precision, Recall, F1-Score on validation set

Confusion Matrix

ROC Curve and AUC

Results on validation set (sample output):

Accuracy: 0.9625 F1-score (macro avg): 0.9612

However, on Kaggle test set:

Public Score: 0.7348

7. Test Set Predictions¶

The final model was evaluated on the test set using a simplified transform (no augmentation), and predictions were generated as follows:

outputs = model(images) _, preds = torch.max(outputs, 1) submission_df = pd.DataFrame(predictions, columns=['id', 'label']) submission_df.to_csv("submission.csv", index=False)

8. Discussion & Limitations¶

Although the model performed well on validation data (96% accuracy), its Kaggle test score was only 0.7348. This discrepancy suggests potential overfitting or data leakage, or possibly:

Insufficient augmentation or regularization

Train/val split mismatch vs. Kaggle hidden test distribution

Model not capturing subtle texture patterns without deeper architecture

9. Conclusion¶

We demonstrated a working ResNet18 classifier using PyTorch for cancer detection on PCam data. While the model generalized moderately to the test set, future work should include:

Stratified k-fold cross-validation

More aggressive augmentation

Advanced architectures (e.g., EfficientNet, DenseNet)

Regularization techniques like early stopping and LR scheduling

In [42]:
import warnings
warnings.filterwarnings("ignore", category=UserWarning, module="torchvision.models._utils")
In [5]:
import os
import pandas as pd
import matplotlib.pyplot as plt
from PIL import Image
import numpy as np
from torchvision import transforms
import torch
from torch.utils.data import Dataset, DataLoader
import torch.nn as nn
import torch.optim as optim
from sklearn.model_selection import train_test_split
import seaborn as sns
import matplotlib.pyplot as plt

# Paths
train_labels_path = 'train_labels.csv'
train_folder = 'train'
test_folder = 'test'

# Load training labels
df = pd.read_csv(train_labels_path)

# Custom Dataset Class 
class PCamDataset(Dataset):
    def __init__(self, dataframe, image_folder, transform=None):
        self.dataframe = dataframe
        self.image_folder = image_folder
        self.transform = transform

    def __len__(self):
        return len(self.dataframe)

    def __getitem__(self, idx):
        image_id = self.dataframe.iloc[idx, 0]
        label = self.dataframe.iloc[idx, 1]
        img_path = os.path.join(self.image_folder, image_id + '.tif')
        image = Image.open(img_path)

        if self.transform:
            image = self.transform(image)

        return image, label

# Transforms
transform = transforms.Compose([
    transforms.ToTensor()
])

# Train-validation split
train_df, val_df = train_test_split(df, test_size=0.2, random_state=42)

# Datasets
train_dataset = PCamDataset(train_df, train_folder, transform=transform)
val_dataset = PCamDataset(val_df, train_folder, transform=transform)

# DataLoaders
train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=64)

# EDA: Show label distribution
label_counts = df['label'].value_counts()


sns.barplot(x=label_counts.index, y=label_counts.values)
plt.title('Label Distribution')
plt.xlabel('Label')
plt.ylabel('Count')
plt.show()

sample_images = df.sample(5, random_state=42)

fig, axes = plt.subplots(1, 5, figsize=(15, 3))
for ax, (idx, row) in zip(axes, sample_images.iterrows()):
    image_id = row['id']
    label = row['label']
    image_path = os.path.join(train_folder, image_id + '.tif')
    image = Image.open(image_path)
    ax.imshow(image)
    ax.set_title(f"Label: {label}")
    ax.axis('off')
plt.tight_layout()
plt.show()
No description has been provided for this image
No description has been provided for this image
In [13]:
import torch.nn.functional as F

# Define the CNN architecture
class SimpleCNN(nn.Module):
    def __init__(self):
        super(SimpleCNN, self).__init__()
        self.conv1 = nn.Conv2d(3, 32, kernel_size=3, padding=1)
        self.conv2 = nn.Conv2d(32, 64, kernel_size=3, padding=1)
        self.pool = nn.MaxPool2d(2, 2)
        
        # Dummy forward to infer flattened size
        dummy_input = torch.zeros(1, 3, 96, 96)  # Adjust this to match your actual image size
        x = self.pool(F.relu(self.conv1(dummy_input)))
        x = self.pool(F.relu(self.conv2(x)))
        self.flattened_size = x.view(1, -1).shape[1]
        
        self.fc1 = nn.Linear(self.flattened_size, 128)
        self.fc2 = nn.Linear(128, 2)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(x.size(0), -1)
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        return x

# Initialize model, loss, optimizer
model = SimpleCNN()
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)

criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Training loop
train_losses, val_losses = [], []
train_accuracies, val_accuracies = [], []
num_epochs = 5

for epoch in range(num_epochs):
    model.train()
    running_loss, correct, total = 0.0, 0, 0
    for images, labels in train_loader:
        images, labels = images.to(device), labels.to(device)
        optimizer.zero_grad()
        outputs = model(images)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        running_loss += loss.item() * images.size(0)
        _, preds = torch.max(outputs, 1)
        correct += (preds == labels).sum().item()
        total += labels.size(0)

    train_losses.append(running_loss / total)
    train_accuracies.append(correct / total)

    # Validation
    model.eval()
    val_loss, correct, total = 0.0, 0, 0
    with torch.no_grad():
        for images, labels in val_loader:
            images, labels = images.to(device), labels.to(device)
            outputs = model(images)
            loss = criterion(outputs, labels)

            val_loss += loss.item() * images.size(0)
            _, preds = torch.max(outputs, 1)
            correct += (preds == labels).sum().item()
            total += labels.size(0)

    val_losses.append(val_loss / total)
    val_accuracies.append(correct / total)

# Plot training and validation accuracy/loss over time
import matplotlib.pyplot as plt

fig, axs = plt.subplots(1, 2, figsize=(12, 5))

axs[0].plot(train_losses, label="Train Loss")
axs[0].plot(val_losses, label="Val Loss")
axs[0].set_title("Loss over Epochs")
axs[0].set_xlabel("Epoch")
axs[0].set_ylabel("Loss")
axs[0].legend()

axs[1].plot(train_accuracies, label="Train Accuracy")
axs[1].plot(val_accuracies, label="Val Accuracy")
axs[1].set_title("Accuracy over Epochs")
axs[1].set_xlabel("Epoch")
axs[1].set_ylabel("Accuracy")
axs[1].legend()

plt.tight_layout()
plt.show()
No description has been provided for this image

Below we will attempt to implement a smarter version of CNN.

In [15]:
import torch.nn as nn
import torch.nn.functional as F

class SmartCNN(nn.Module):
    def __init__(self):
        super(SmartCNN, self).__init__()
        
        self.conv1 = nn.Conv2d(3, 32, kernel_size=3, padding=1)
        self.bn1   = nn.BatchNorm2d(32)

        self.conv2 = nn.Conv2d(32, 64, kernel_size=3, padding=1)
        self.bn2   = nn.BatchNorm2d(64)

        self.conv3 = nn.Conv2d(64, 128, kernel_size=3, padding=1)
        self.bn3   = nn.BatchNorm2d(128)

        self.pool  = nn.MaxPool2d(2, 2)
        self.dropout = nn.Dropout(0.4)

        # Global Average Pool to reduce to (batch_size, 128, 1, 1)
        self.global_pool = nn.AdaptiveAvgPool2d((1, 1))

        # Fully connected layer: 128 → 2 (binary classification)
        self.fc = nn.Linear(128, 2)

    def forward(self, x):
        x = self.pool(F.relu(self.bn1(self.conv1(x))))  # (32x32 → 16x16)
        x = self.pool(F.relu(self.bn2(self.conv2(x))))  # (16x16 → 8x8)
        x = self.pool(F.relu(self.bn3(self.conv3(x))))  # (8x8 → 4x4)

        x = self.global_pool(x)                         # (4x4 → 1x1)
        x = x.view(x.size(0), -1)                       # flatten to (batch_size, 128)
        x = self.dropout(x)
        x = self.fc(x)                                  # final logits

        return x
# Initialize model, loss, optimizer
model = SmartCNN()
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)

criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Training loop
train_losses, val_losses = [], []
train_accuracies, val_accuracies = [], []
num_epochs = 5

for epoch in range(num_epochs):
    model.train()
    running_loss, correct, total = 0.0, 0, 0
    for images, labels in train_loader:
        images, labels = images.to(device), labels.to(device)
        optimizer.zero_grad()
        outputs = model(images)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        running_loss += loss.item() * images.size(0)
        _, preds = torch.max(outputs, 1)
        correct += (preds == labels).sum().item()
        total += labels.size(0)

    train_losses.append(running_loss / total)
    train_accuracies.append(correct / total)

    # Validation
    model.eval()
    val_loss, correct, total = 0.0, 0, 0
    with torch.no_grad():
        for images, labels in val_loader:
            images, labels = images.to(device), labels.to(device)
            outputs = model(images)
            loss = criterion(outputs, labels)

            val_loss += loss.item() * images.size(0)
            _, preds = torch.max(outputs, 1)
            correct += (preds == labels).sum().item()
            total += labels.size(0)

    val_losses.append(val_loss / total)
    val_accuracies.append(correct / total)

# Plot training and validation accuracy/loss over time
import matplotlib.pyplot as plt

fig, axs = plt.subplots(1, 2, figsize=(12, 5))

axs[0].plot(train_losses, label="Train Loss")
axs[0].plot(val_losses, label="Val Loss")
axs[0].set_title("Loss over Epochs")
axs[0].set_xlabel("Epoch")
axs[0].set_ylabel("Loss")
axs[0].legend()

axs[1].plot(train_accuracies, label="Train Accuracy")
axs[1].plot(val_accuracies, label="Val Accuracy")
axs[1].set_title("Accuracy over Epochs")
axs[1].set_xlabel("Epoch")
axs[1].set_ylabel("Accuracy")
axs[1].legend()

plt.tight_layout()
plt.show()
No description has been provided for this image

below we will show confusion matrix

In [22]:
from sklearn.metrics import confusion_matrix
import seaborn as sns

# Gather all validation predictions and true labels
all_preds, all_labels = [], []
model.eval()
with torch.no_grad():
    for images, labels in val_loader:
        outputs = model(images.to(device))
        _, preds = torch.max(outputs, 1)
        all_preds.extend(preds.cpu().numpy())
        all_labels.extend(labels.numpy())

cm = confusion_matrix(all_labels, all_preds)
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues')
plt.xlabel("Predicted")
plt.ylabel("True")
plt.title("Confusion Matrix")
plt.show()
No description has been provided for this image

below we will show the classification report

In [24]:
from sklearn.metrics import classification_report

print(classification_report(all_labels, all_preds, digits=4))
              precision    recall  f1-score   support

           0     0.9141    0.9473    0.9304     26177
           1     0.9183    0.8693    0.8931     17828

    accuracy                         0.9157     44005
   macro avg     0.9162    0.9083    0.9118     44005
weighted avg     0.9158    0.9157    0.9153     44005

below we will show the roc auc

In [26]:
from sklearn.metrics import roc_curve, auc

# For ROC, need probability scores (for class 1)
probs = []
model.eval()
with torch.no_grad():
    for images, _ in val_loader:
        outputs = model(images.to(device))
        probs.extend(F.softmax(outputs, dim=1)[:,1].cpu().numpy())

fpr, tpr, _ = roc_curve(all_labels, probs)
roc_auc = auc(fpr, tpr)

plt.plot(fpr, tpr, label=f"AUC = {roc_auc:.4f}")
plt.plot([0, 1], [0, 1], 'k--')
plt.xlabel("FPR")
plt.ylabel("TPR")
plt.title("ROC Curve")
plt.legend()
plt.grid(True)
plt.show()
No description has been provided for this image

below we will see some sample predictions

In [28]:
import random

model.eval()
with torch.no_grad():
    batch = next(iter(val_loader))
    images, labels = batch
    outputs = model(images.to(device))
    _, preds = torch.max(outputs, 1)

    fig, axs = plt.subplots(1, 5, figsize=(15, 3))
    for i in range(5):
        axs[i].imshow(images[i].permute(1, 2, 0))
        axs[i].set_title(f"Pred: {preds[i].item()} | True: {labels[i].item()}")
        axs[i].axis('off')
    plt.tight_layout()
    plt.show()
No description has been provided for this image

trying the model with resnet and data augmentation @ 10 epochs

In [36]:
import torch.nn as nn
import torch.nn.functional as F
import torchvision.models as models
from torchvision import transforms

# Use pretrained ResNet18 and replace the classifier
model = models.resnet18(pretrained=True)
num_ftrs = model.fc.in_features
model.fc = nn.Linear(num_ftrs, 2)  # Binary classification

# Data augmentation + normalization for training
transform = transforms.Compose([
    transforms.RandomHorizontalFlip(),
    transforms.RandomRotation(10),
    transforms.ColorJitter(brightness=0.2, contrast=0.2, saturation=0.2),
    transforms.ToTensor()
])

# Initialize model, loss, optimizer
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)

criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.0005)

# Training loop
train_losses, val_losses = [], []
train_accuracies, val_accuracies = [], []
num_epochs = 10

for epoch in range(num_epochs):
    model.train()
    running_loss, correct, total = 0.0, 0, 0
    for images, labels in train_loader:
        images, labels = images.to(device), labels.to(device)
        optimizer.zero_grad()
        outputs = model(images)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        running_loss += loss.item() * images.size(0)
        _, preds = torch.max(outputs, 1)
        correct += (preds == labels).sum().item()
        total += labels.size(0)

    train_losses.append(running_loss / total)
    train_accuracies.append(correct / total)

    # Validation
    model.eval()
    val_loss, correct, total = 0.0, 0, 0
    with torch.no_grad():
        for images, labels in val_loader:
            images, labels = images.to(device), labels.to(device)
            outputs = model(images)
            loss = criterion(outputs, labels)

            val_loss += loss.item() * images.size(0)
            _, preds = torch.max(outputs, 1)
            correct += (preds == labels).sum().item()
            total += labels.size(0)

    val_losses.append(val_loss / total)
    val_accuracies.append(correct / total)

# Plot training and validation accuracy/loss over time
import matplotlib.pyplot as plt

fig, axs = plt.subplots(1, 2, figsize=(12, 5))

axs[0].plot(train_losses, label="Train Loss")
axs[0].plot(val_losses, label="Val Loss")
axs[0].set_title("Loss over Epochs")
axs[0].set_xlabel("Epoch")
axs[0].set_ylabel("Loss")
axs[0].legend()

axs[1].plot(train_accuracies, label="Train Accuracy")
axs[1].plot(val_accuracies, label="Val Accuracy")
axs[1].set_title("Accuracy over Epochs")
axs[1].set_xlabel("Epoch")
axs[1].set_ylabel("Accuracy")
axs[1].legend()

plt.tight_layout()
plt.show()

# Gather all validation predictions and true labels for confusion matrix and classification report 
all_preds, all_labels = [], []
model.eval()
with torch.no_grad():
    for images, labels in val_loader:
        outputs = model(images.to(device))
        _, preds = torch.max(outputs, 1)
        all_preds.extend(preds.cpu().numpy())
        all_labels.extend(labels.numpy())
        
# Classification report
from sklearn.metrics import classification_report, confusion_matrix, roc_curve, auc
print(classification_report(all_labels, all_preds, digits=4))

cm = confusion_matrix(all_labels, all_preds)
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues')
plt.xlabel("Predicted")
plt.ylabel("True")
plt.title("Confusion Matrix")
plt.show()

#ROC
# For ROC, need probability scores (for class 1)
probs = []
model.eval()
with torch.no_grad():
    for images, _ in val_loader:
        outputs = model(images.to(device))
        probs.extend(F.softmax(outputs, dim=1)[:,1].cpu().numpy())

fpr, tpr, _ = roc_curve(all_labels, probs)
roc_auc = auc(fpr, tpr)

plt.plot(fpr, tpr, label=f"AUC = {roc_auc:.4f}")
plt.plot([0, 1], [0, 1], 'k--')
plt.xlabel("FPR")
plt.ylabel("TPR")
plt.title("ROC Curve")
plt.legend()
plt.grid(True)
plt.show()
C:\Users\forca\anaconda3\Lib\site-packages\torchvision\models\_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.
  warnings.warn(
C:\Users\forca\anaconda3\Lib\site-packages\torchvision\models\_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=ResNet18_Weights.IMAGENET1K_V1`. You can also use `weights=ResNet18_Weights.DEFAULT` to get the most up-to-date weights.
  warnings.warn(msg)
No description has been provided for this image
              precision    recall  f1-score   support

           0     0.9690    0.9681    0.9685     26177
           1     0.9532    0.9545    0.9538     17828

    accuracy                         0.9625     44005
   macro avg     0.9611    0.9613    0.9612     44005
weighted avg     0.9626    0.9625    0.9626     44005

No description has been provided for this image
No description has been provided for this image
In [ ]: