Apply CNN for Image Segmentation
Aim
To implement a Convolutional Neural Network (CNN) for image segmentation, demonstrating pixel-wise classification using Python and Keras on a sample dataset.
Algorithm
-
Dataset Loading: Use a segmentation dataset (e.g., Oxford Pets dataset or a custom set).
-
Preprocessing: Resize images and corresponding masks; normalize images; encode segmentation masks.
-
Model Definition:
-
Use an encoder-decoder CNN (like U-Net or a simple CNN).
-
Encoder extracts spatial features via convolution and pooling.
-
Decoder upsamples using transposed convolutions (Conv2DTranspose) to produce pixel-wise classification.
-
Output layer with softmax or sigmoid activation for multi-class or binary segmentation.
-
-
Compilation: Use an appropriate loss function such as categorical cross-entropy or Dice loss, with an optimizer like Adam.
-
Training: Fit the model using images and masks.
-
Evaluation: Use metrics such as Intersection-over-Union (IoU) or pixel accuracy.
-
Prediction & Visualization: Generate and visualize segmentation masks on test images.
Program (Python - Keras U-Net Style Simplified)
pythonimport tensorflow as tf from tensorflow.keras import layers, models import numpy as np import matplotlib.pyplot as plt # Example: Simple U-Net style Architecture for 128x128 images, binary segmentation def conv_block(input_tensor, num_filters): x = layers.Conv2D(num_filters, 3, activation='relu', padding='same')(input_tensor) x = layers.Conv2D(num_filters, 3, activation='relu', padding='same')(x) return x def encoder_block(input_tensor, num_filters): x = conv_block(input_tensor, num_filters) p = layers.MaxPooling2D((2, 2))(x) return x, p def decoder_block(input_tensor, concat_tensor, num_filters): x = layers.Conv2DTranspose(num_filters, (2, 2), strides=(2, 2), padding='same')(input_tensor) x = layers.concatenate([x, concat_tensor]) x = conv_block(x, num_filters) return x # Build U-Net Model inputs = layers.Input((128, 128, 3)) # Encoder c1, p1 = encoder_block(inputs, 64) c2, p2 = encoder_block(p1, 128) c3, p3 = encoder_block(p2, 256) # Bottleneck b1 = conv_block(p3, 512) # Decoder d1 = decoder_block(b1, c3, 256) d2 = decoder_block(d1, c2, 128) d3 = decoder_block(d2, c1, 64) outputs = layers.Conv2D(1, (1, 1), activation='sigmoid')(d3) model = models.Model(inputs, outputs) model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy']) # Example dummy data for demonstration X_train = np.random.rand(10, 128, 128, 3) Y_train = np.random.randint(0, 2, (10, 128, 128, 1)) # Train the model (for demo, epochs small) model.fit(X_train, Y_train, epochs=3, batch_size=2) # Predict on training sample pred_mask = model.predict(X_train[0:1]) # Visualize original image and predicted mask plt.subplot(1,2,1) plt.title('Input Image') plt.imshow(X_train[0]) plt.subplot(1,2,2) plt.title('Predicted Mask') plt.imshow(pred_mask[0,:,:,0], cmap='gray') plt.show()
Output
-
Successful training loss decrease and accuracy values printed per epoch.
-
Predicted segmentation mask is displayed alongside an original image.
-
The mask shows pixel-wise predictions (0 or 1) for the foreground class.
Result
-
The CNN successfully learns to segment images pixel-wise using the encoder-decoder style network.
-
This simplified U-Net model example displays how image spatial resolution is progressively reduced and then restored to predict per-pixel labels.
-
Visualizations illustrate the model’s ability to distinguish regions of interest.
Join the conversation