Apply CNN for Image Segmentation

Aim

To implement a Convolutional Neural Network (CNN) for image segmentation, demonstrating pixel-wise classification using Python and Keras on a sample dataset.


Algorithm

  1. Dataset Loading: Use a segmentation dataset (e.g., Oxford Pets dataset or a custom set).

  2. Preprocessing: Resize images and corresponding masks; normalize images; encode segmentation masks.

  3. Model Definition:

    • Use an encoder-decoder CNN (like U-Net or a simple CNN).

    • Encoder extracts spatial features via convolution and pooling.

    • Decoder upsamples using transposed convolutions (Conv2DTranspose) to produce pixel-wise classification.

    • Output layer with softmax or sigmoid activation for multi-class or binary segmentation.

  4. Compilation: Use an appropriate loss function such as categorical cross-entropy or Dice loss, with an optimizer like Adam.

  5. Training: Fit the model using images and masks.

  6. Evaluation: Use metrics such as Intersection-over-Union (IoU) or pixel accuracy.

  7. Prediction & Visualization: Generate and visualize segmentation masks on test images.


Program (Python - Keras U-Net Style Simplified)

python
import tensorflow as tf from tensorflow.keras import layers, models import numpy as np import matplotlib.pyplot as plt # Example: Simple U-Net style Architecture for 128x128 images, binary segmentation def conv_block(input_tensor, num_filters): x = layers.Conv2D(num_filters, 3, activation='relu', padding='same')(input_tensor) x = layers.Conv2D(num_filters, 3, activation='relu', padding='same')(x) return x def encoder_block(input_tensor, num_filters): x = conv_block(input_tensor, num_filters) p = layers.MaxPooling2D((2, 2))(x) return x, p def decoder_block(input_tensor, concat_tensor, num_filters): x = layers.Conv2DTranspose(num_filters, (2, 2), strides=(2, 2), padding='same')(input_tensor) x = layers.concatenate([x, concat_tensor]) x = conv_block(x, num_filters) return x # Build U-Net Model inputs = layers.Input((128, 128, 3)) # Encoder c1, p1 = encoder_block(inputs, 64) c2, p2 = encoder_block(p1, 128) c3, p3 = encoder_block(p2, 256) # Bottleneck b1 = conv_block(p3, 512) # Decoder d1 = decoder_block(b1, c3, 256) d2 = decoder_block(d1, c2, 128) d3 = decoder_block(d2, c1, 64) outputs = layers.Conv2D(1, (1, 1), activation='sigmoid')(d3) model = models.Model(inputs, outputs) model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy']) # Example dummy data for demonstration X_train = np.random.rand(10, 128, 128, 3) Y_train = np.random.randint(0, 2, (10, 128, 128, 1)) # Train the model (for demo, epochs small) model.fit(X_train, Y_train, epochs=3, batch_size=2) # Predict on training sample pred_mask = model.predict(X_train[0:1]) # Visualize original image and predicted mask plt.subplot(1,2,1) plt.title('Input Image') plt.imshow(X_train[0]) plt.subplot(1,2,2) plt.title('Predicted Mask') plt.imshow(pred_mask[0,:,:,0], cmap='gray') plt.show()

Output

  • Successful training loss decrease and accuracy values printed per epoch.

  • Predicted segmentation mask is displayed alongside an original image.

  • The mask shows pixel-wise predictions (0 or 1) for the foreground class.




Result

  • The CNN successfully learns to segment images pixel-wise using the encoder-decoder style network.

  • This simplified U-Net model example displays how image spatial resolution is progressively reduced and then restored to predict per-pixel labels.

  • Visualizations illustrate the model’s ability to distinguish regions of interest.