Week 5 project: GANs¶
This notebook solves the I’m Something of a Painter Myself Kaggle challenge. We are given normal photos along with Monet paintings, and instructed to transform the photos such that they take the style of the Monet paintings.
To be precise, we technically do not need to use the provided photos - the ultimate goal is to produce Monet style images. Nevertheless, we will be transferring the Monet style to the photos as that is more interesting. CycleGAN will be used, as it was designed with that goal in mind. But first, let's introduce GAN first.
GAN stands for Generative Adversarial Network. It is called Generative because it has a generator component that generates output (usually images) from input. It is called Adversarial because there is an adversarial component called the discriminator which tries to distinguish the generated output from real images. The discriminator's result is used to train the generator and vice versa.
CycleGAN is a special class of GAN which uses four models instead of two: in our case, there will be a generator that transforms a regular photo to Monet style, and another that transforms Monet style back to a regular photo, and each generator has a corresponding discriminator. The model has two generators for a reason: the goal is to transfer the style only, so ideally an image passed through both generators (a round trip) should look just like the original image. This way of training is very suitable in our case as it does not require paired images.
Imagine an analogy of an English to French translator. The first generator translates the text to French, and the other translates French to English. Each discriminator validates the translated text is indeed French/English respectively. If the translator is working properly, a text translated from English to French and then back to English should be almost the same as the original English text.
The first part of this project, which is the extraction of the data and training the CycleGAN, is built upon Kaggle's tutorial with some additions (the EDA section). My main contribution will be comparing it to a modified version I call Styled CycleGAN, which adds extra loss functions.
Introduction and Setup¶
This notebook utilizes a CycleGAN architecture to add Monet-style to photos. For this tutorial, we will be using the TFRecord dataset. Import the following packages and configure for GPU.
For more information, check out TensorFlow and Keras CycleGAN documentation pages.
from os import path, makedirs, environ
import shutil
from PIL import Image
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
if 'KAGGLE_URL_BASE' in environ:
from kaggle_datasets import KaggleDatasets
import matplotlib.pyplot as plt
import numpy as np
from skimage.metrics import structural_similarity as ssim
import cv2
# Configure GPU
gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
try:
for gpu in gpus:
tf.config.experimental.set_memory_growth(gpu, True)
print("Using GPU:", gpus[0])
strategy = tf.distribute.MirroredStrategy()
except RuntimeError as e:
print(e)
else:
strategy = tf.distribute.get_strategy()
print('Number of replicas:', strategy.num_replicas_in_sync)
AUTOTUNE = tf.data.experimental.AUTOTUNE
print(tf.__version__)
2025-09-26 08:08:07.741578: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
Using GPU: PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')
INFO:tensorflow:Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0',)
Number of replicas: 1
2.20.0
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR I0000 00:00:1758863291.679451 1662340 gpu_device.cc:2020] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 4622 MB memory: -> device: 0, name: NVIDIA GeForce GTX 1660 Ti, pci bus id: 0000:01:00.0, compute capability: 7.5
Load in the data¶
We want to keep our photo dataset and our Monet dataset separate. First, load in the filenames of the TFRecords.
GCS_PATH = KaggleDatasets().get_gcs_path() if 'KAGGLE_URL_BASE' in environ else "data"
IMG_PATH = "../tmp" if 'KAGGLE_URL_BASE' in environ else "tmp"
ZIP_NAME = "/kaggle/working/images" if 'KAGGLE_URL_BASE' in environ else "images"
makedirs(IMG_PATH, exist_ok=True)
MONET_FILENAMES = tf.io.gfile.glob(str(GCS_PATH + '/monet_tfrec/*.tfrec'))
print('Monet TFRecord Files:', len(MONET_FILENAMES))
PHOTO_FILENAMES = tf.io.gfile.glob(str(GCS_PATH + '/photo_tfrec/*.tfrec'))
print('Photo TFRecord Files:', len(PHOTO_FILENAMES))
Monet TFRecord Files: 5 Photo TFRecord Files: 20
All the images for the competition are already sized to 256x256. As these images are RGB images, set the channel to 3. Additionally, we need to scale the images to a [-1, 1] scale. Because we are building a generative model, we don't need the labels or the image id so we'll only return the image from the TFRecord.
IMAGE_SIZE = [256, 256]
def decode_image(image):
image = tf.image.decode_jpeg(image, channels=3)
image = (tf.cast(image, tf.float32) / 127.5) - 1
image = tf.reshape(image, [*IMAGE_SIZE, 3])
return image
def read_tfrecord(example):
tfrecord_format = {
"image_name": tf.io.FixedLenFeature([], tf.string),
"image": tf.io.FixedLenFeature([], tf.string),
"target": tf.io.FixedLenFeature([], tf.string)
}
example = tf.io.parse_single_example(example, tfrecord_format)
image = decode_image(example['image'])
return image
Define the function to extract the image from the files.
def load_dataset(filenames, labeled=True, ordered=False):
dataset = tf.data.TFRecordDataset(filenames)
dataset = dataset.map(read_tfrecord, num_parallel_calls=AUTOTUNE)
return dataset
Let's load in our datasets.
monet_ds = load_dataset(MONET_FILENAMES, labeled=True).batch(1)
photo_ds = load_dataset(PHOTO_FILENAMES, labeled=True).batch(1)
example_monet = next(iter(monet_ds))
example_photo = next(iter(photo_ds))
2025-09-26 08:08:11.842636: I tensorflow/core/kernels/data/tf_record_dataset_op.cc:390] TFRecordDataset `buffer_size` is unspecified, default to 262144
Let's visualize a photo example and a Monet example.
plt.subplot(121)
plt.title('Photo')
plt.imshow(example_photo[0] * 0.5 + 0.5)
plt.subplot(122)
plt.title('Monet')
plt.imshow(example_monet[0] * 0.5 + 0.5)
<matplotlib.image.AxesImage at 0x7f8e8089bed0>
Exploratory Data Analysis (EDA)¶
Before building our models, let's analyze the dataset to understand its characteristics and properties.
def analyze_dataset(dataset, name):
"""Comprehensive analysis of the dataset"""
print("\n=== %s Dataset Analysis ===" % name)
# Count number of samples
num_samples = 0
for _ in dataset:
num_samples += 1
print("Number of samples: %d" % num_samples)
# Analyze first few images
sample_count = 0
pixel_stats = []
for img in dataset.take(20): # Analyze first 5 batches
img_np = img.numpy()
pixel_stats.append(img_np)
sample_count += 1
if sample_count <= 3: # Detailed analysis for first 3 images
print("\nSample %d:" % sample_count)
print("\tShape: %s" % (img_np.shape,))
print("\tData type: %s" % img_np.dtype)
print("\tValue range: [%.3f, %.3f]" % (img_np.min(), img_np.max()))
print("\tMean: %.3f, Std: %.3f" % (img_np.mean(), img_np.std()))
# Pixel intensity distribution
unique, counts = np.unique((img_np * 127.5 + 127.5).astype(np.uint8), return_counts=True)
print("\tUnique pixel values: %d" % len(unique))
# Overall statistics
if pixel_stats:
all_pixels = np.concatenate([img.flatten() for img in pixel_stats])
print("\nOverall Statistics (%d images):" % sample_count)
print("\tGlobal min: %.3f" % all_pixels.min())
print("\tGlobal max: %.3f" % all_pixels.max())
print("\tGlobal mean: %.3f" % all_pixels.mean())
print("\tGlobal std: %.3f" % all_pixels.std())
return num_samples
# Analyze both datasets
monet_count = analyze_dataset(load_dataset(MONET_FILENAMES), "Monet")
photo_count = analyze_dataset(load_dataset(PHOTO_FILENAMES), "Photo")
=== Monet Dataset Analysis === Number of samples: 300 Sample 1: Shape: (256, 256, 3) Data type: float32 Value range: [-1.000, 0.945] Mean: -0.123, Std: 0.517 Unique pixel values: 248 Sample 2: Shape: (256, 256, 3) Data type: float32 Value range: [-1.000, 1.000] Mean: 0.189, Std: 0.373 Unique pixel values: 250 Sample 3: Shape: (256, 256, 3) Data type: float32 Value range: [-1.000, 0.867] Mean: 0.135, Std: 0.283 Unique pixel values: 235 Overall Statistics (20 images): Global min: -1.000 Global max: 1.000 Global mean: 0.048 Global std: 0.429 === Photo Dataset Analysis ===
2025-09-26 08:08:12.249903: I tensorflow/core/framework/local_rendezvous.cc:407] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence 2025-09-26 08:08:12.299839: I tensorflow/core/framework/local_rendezvous.cc:407] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
Number of samples: 7038 Sample 1: Shape: (256, 256, 3) Data type: float32 Value range: [-1.000, 1.000] Mean: 0.181, Std: 0.487 Unique pixel values: 256 Sample 2: Shape: (256, 256, 3) Data type: float32 Value range: [-1.000, 1.000] Mean: -0.034, Std: 0.452 Unique pixel values: 256 Sample 3: Shape: (256, 256, 3) Data type: float32 Value range: [-1.000, 1.000] Mean: -0.441, Std: 0.450 Unique pixel values: 256 Overall Statistics (20 images): Global min: -1.000 Global max: 1.000 Global mean: -0.117 Global std: 0.530
2025-09-26 08:08:13.417837: I tensorflow/core/framework/local_rendezvous.cc:407] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
# Visualize pixel intensity distributions
def plot_pixel_distributions(monet_ds, photo_ds, num_samples=100):
"""Plot pixel intensity distributions for both datasets"""
# Collect pixel values
monet_pixels = []
photo_pixels = []
for monet_batch, photo_batch in zip(monet_ds.take(num_samples), photo_ds.take(num_samples)):
monet_pixels.extend(monet_batch.numpy().flatten())
photo_pixels.extend(photo_batch.numpy().flatten())
# Plot distributions
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 5))
# Monet distribution
ax1.hist(monet_pixels, bins=50, alpha=0.7, color='blue', density=True)
ax1.set_title('Monet Dataset - Pixel Intensity Distribution')
ax1.set_xlabel('Pixel Value (normalized)')
ax1.set_ylabel('Density')
ax1.grid(True, alpha=0.3)
ax1.axvline(np.mean(monet_pixels), color='red', linestyle='--', label='Mean: %.3f' % np.mean(monet_pixels))
ax1.legend()
# Photo distribution
ax2.hist(photo_pixels, bins=50, alpha=0.7, color='green', density=True)
ax2.set_title('Photo Dataset - Pixel Intensity Distribution')
ax2.set_xlabel('Pixel Value (normalized)')
ax2.set_ylabel('Density')
ax2.grid(True, alpha=0.3)
ax2.axvline(np.mean(photo_pixels), color='red', linestyle='--', label='Mean: %.3f' % np.mean(photo_pixels))
ax2.legend()
plt.tight_layout()
plt.show()
return monet_pixels, photo_pixels
monet_pixels, photo_pixels = plot_pixel_distributions(monet_ds, photo_ds)
# Analyze image similarities within and between domains
def analyze_image_similarities(monet_ds, photo_ds, num_samples=50):
"""Analyze structural similarities between images"""
print("=== Image Similarity Analysis ===")
# Collect sample images
monet_samples = []
photo_samples = []
for monet_batch, photo_batch in zip(monet_ds.take(num_samples), photo_ds.take(num_samples)):
monet_img = (monet_batch[0].numpy() * 127.5 + 127.5).astype(np.uint8)
photo_img = (photo_batch[0].numpy() * 127.5 + 127.5).astype(np.uint8)
monet_samples.append(cv2.cvtColor(monet_img, cv2.COLOR_RGB2GRAY))
photo_samples.append(cv2.cvtColor(photo_img, cv2.COLOR_RGB2GRAY))
# Calculate within-domain similarities
def calculate_within_similarity(images, domain_name):
similarities = []
for i in range(len(images)):
for j in range(i + 1, len(images)):
sim = ssim(images[i], images[j])
similarities.append(sim)
print(domain_name, "- Within-domain similarity:")
print("\tMean SSIM: %.3f" % np.mean(similarities))
print("\tStd SSIM: %.3f" % np.std(similarities))
print("\tRange: [%.3f, %.3f]" % (np.min(similarities), np.max(similarities)))
return similarities
# Calculate cross-domain similarities
def calculate_cross_similarity(images1, images2, domain1, domain2):
similarities = []
for img1 in images1:
for img2 in images2:
sim = ssim(img1, img2)
similarities.append(sim)
print("%s-%s - Cross-domain similarity:" % (domain1, domain2))
print("\tMean SSIM: %.3f" % np.mean(similarities))
print("\tStd SSIM: %.3f" % np.std(similarities))
return similarities
monet_similarities = calculate_within_similarity(monet_samples, "Monet")
photo_similarities = calculate_within_similarity(photo_samples, "Photo")
cross_similarities = calculate_cross_similarity(monet_samples, photo_samples, "Monet", "Photo")
# Plot similarity distributions
plt.figure(figsize=(12, 5))
plt.subplot(1, 3, 1)
plt.hist(monet_similarities, bins=20, alpha=0.7, color='blue')
plt.title('Monet-Monet Similarities')
plt.xlabel('SSIM')
plt.ylabel('Frequency')
plt.subplot(1, 3, 2)
plt.hist(photo_similarities, bins=20, alpha=0.7, color='green')
plt.title('Photo-Photo Similarities')
plt.xlabel('SSIM')
plt.ylabel('Frequency')
plt.subplot(1, 3, 3)
plt.hist(cross_similarities, bins=20, alpha=0.7, color='red')
plt.title('Monet-Photo Similarities')
plt.xlabel('SSIM')
plt.ylabel('Frequency')
plt.tight_layout()
plt.show()
analyze_image_similarities(monet_ds, photo_ds)
=== Image Similarity Analysis === Monet - Within-domain similarity: Mean SSIM: 0.124 Std SSIM: 0.062 Range: [0.026, 0.358] Photo - Within-domain similarity: Mean SSIM: 0.139 Std SSIM: 0.084 Range: [0.020, 0.675] Monet-Photo - Cross-domain similarity: Mean SSIM: 0.125 Std SSIM: 0.069
# Enhanced visualization of dataset samples
def visualize_dataset_samples(monet_ds, photo_ds, num_samples=5):
"""Visualize samples from both domains with enhanced analysis"""
fig, axes = plt.subplots(2, num_samples, figsize=(20, 8))
# Monet samples
for i, img in enumerate(monet_ds.take(num_samples)):
img_np = (img[0].numpy() * 0.5 + 0.5) # Convert back to [0,1]
axes[0, i].imshow(img_np)
axes[0, i].set_title('Monet Sample %d\nMean: %.3f' % (i+1, img_np.mean()))
axes[0, i].axis('off')
# Add pixel intensity histogram inset
inset = axes[0, i].inset_axes([0.6, 0.02, 0.35, 0.25])
inset.hist(img_np.flatten(), bins=50, alpha=0.7)
inset.set_xticks([])
inset.set_yticks([])
# Photo samples
for i, img in enumerate(photo_ds.take(num_samples)):
img_np = (img[0].numpy() * 0.5 + 0.5) # Convert back to [0,1]
axes[1, i].imshow(img_np)
axes[1, i].set_title('Photo Sample %d\nMean: %.3f' % (i+1, img_np.mean()))
axes[1, i].axis('off')
# Add pixel intensity histogram inset
inset = axes[1, i].inset_axes([0.6, 0.02, 0.35, 0.25])
inset.hist(img_np.flatten(), bins=50, alpha=0.7)
inset.set_xticks([])
inset.set_yticks([])
plt.suptitle('Dataset Samples with Pixel Distribution Insets', fontsize=16)
plt.tight_layout()
plt.show()
visualize_dataset_samples(monet_ds, photo_ds)
2025-09-26 08:09:47.041049: I tensorflow/core/framework/local_rendezvous.cc:407] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
Dataset summary and insights¶
Dataset Sizes:
- Monet paintings: 300 images
- Photos: 7038 images
- Total: 7338 images
Image Specifications:
- Dimensions: 256x256 pixels
- Channels: 3 (RGB)
- Format: JPEG from TFRecords
- Normalization: Scaled to [-1, 1] range
Notes:
- The monet dataset has an average Pixel Intensity Distribution of around 0, while the photo dataset has a lower average intensity.
- TFRecords are a binary storage format by TensorFlow. The images stored inside are in JPEG format. The output is also expected to be in JPEG format.
- The data is not normalized in storage. The
decode_imagefunction makes sure it is in the [-1, 1] range, suitable for tanh layers.
Build the generator¶
We'll be using a UNET architecture for our CycleGAN. To build our generator, let's first define our downsample and upsample methods.
The downsample, as the name suggests, reduces the 2D dimensions, the width and height, of the image by the stride. The stride is the length of the step the filter takes. Since the stride is 2, the filter is applied to every other pixel, hence reducing the weight and height by 2.
We'll be using LayerNormalization instead of instance normalization.
OUTPUT_CHANNELS = 3
def downsample(filters, size, apply_normalization=True):
initializer = tf.random_normal_initializer(0., 0.02)
result = keras.Sequential()
result.add(layers.Conv2D(filters, size, strides=2, padding='same',
kernel_initializer=initializer, use_bias=False))
if apply_normalization:
result.add(layers.LayerNormalization(epsilon=1e-5))
result.add(layers.LeakyReLU())
return result
Upsample does the opposite of downsample and increases the dimensions of the of the image. Conv2DTranspose does basically the opposite of a Conv2D layer.
def upsample(filters, size, dropout=0):
initializer = tf.random_normal_initializer(0., 0.02)
result = keras.Sequential()
result.add(layers.Conv2DTranspose(filters, size, strides=2,
padding='same',
kernel_initializer=initializer,
use_bias=False))
result.add(layers.LayerNormalization(epsilon=1e-5))
if dropout > 0:
result.add(layers.Dropout(dropout))
result.add(layers.ReLU())
return result
Let's build our generator!
The generator first downsamples the input image and then upsample while establishing long skip connections. Skip connections are a way to help bypass the vanishing gradient problem by concatenating the output of a layer to multiple layers instead of only one. Here we concatenate the output of the downsample layer to the upsample layer in a symmetrical fashion.
def Generator(dropout=.5):
inputs = layers.Input(shape=[256,256,3])
# bs = batch size
down_stack = [
downsample(64, 4, apply_normalization=False), # (bs, 128, 128, 64)
downsample(128, 4), # (bs, 64, 64, 128)
downsample(256, 4), # (bs, 32, 32, 256)
downsample(512, 4), # (bs, 16, 16, 512)
downsample(512, 4), # (bs, 8, 8, 512)
downsample(512, 4), # (bs, 4, 4, 512)
downsample(512, 4), # (bs, 2, 2, 512)
downsample(512, 4), # (bs, 1, 1, 512)
]
up_stack = [
upsample(512, 4, dropout=dropout), # (bs, 2, 2, 1024)
upsample(512, 4, dropout=dropout), # (bs, 4, 4, 1024)
upsample(512, 4, dropout=dropout), # (bs, 8, 8, 1024)
upsample(512, 4), # (bs, 16, 16, 1024)
upsample(256, 4), # (bs, 32, 32, 512)
upsample(128, 4), # (bs, 64, 64, 256)
upsample(64, 4), # (bs, 128, 128, 128)
]
initializer = tf.random_normal_initializer(0., 0.02)
last = layers.Conv2DTranspose(OUTPUT_CHANNELS, 4,
strides=2,
padding='same',
kernel_initializer=initializer,
activation='tanh') # (bs, 256, 256, 3)
x = inputs
# Downsampling through the model
skips = []
for down in down_stack:
x = down(x)
skips.append(x)
skips = reversed(skips[:-1])
# Upsampling and establishing the skip connections
for up, skip in zip(up_stack, skips):
x = up(x)
x = layers.Concatenate()([x, skip])
x = last(x)
return keras.Model(inputs=inputs, outputs=x)
Build the discriminator¶
The discriminator takes in the input image and classifies it as real or fake (generated). Instead of outputing a single node, the discriminator outputs a smaller 2D image with higher pixel values indicating a real classification and lower values indicating a fake classification.
def Discriminator():
initializer = tf.random_normal_initializer(0., 0.02)
inp = layers.Input(shape=[256, 256, 3], name='input_image')
x = inp
down1 = downsample(64, 4, False)(x) # (bs, 128, 128, 64)
down2 = downsample(128, 4)(down1) # (bs, 64, 64, 128)
down3 = downsample(256, 4)(down2) # (bs, 32, 32, 256)
zero_pad1 = layers.ZeroPadding2D()(down3) # (bs, 34, 34, 256)
conv = layers.Conv2D(512, 4, strides=1,
kernel_initializer=initializer,
use_bias=False)(zero_pad1) # (bs, 31, 31, 512)
norm1 = layers.LayerNormalization(epsilon=1e-5)(conv)
leaky_relu = layers.LeakyReLU()(norm1)
zero_pad2 = layers.ZeroPadding2D()(leaky_relu) # (bs, 33, 33, 512)
last = layers.Conv2D(1, 4, strides=1,
kernel_initializer=initializer)(zero_pad2) # (bs, 30, 30, 1)
return tf.keras.Model(inputs=inp, outputs=last)
with strategy.scope():
monet_generator = Generator() # transforms photos to Monet-esque paintings
photo_generator = Generator() # transforms Monet paintings to be more like photos
monet_discriminator = Discriminator() # differentiates real Monet paintings and generated Monet paintings
photo_discriminator = Discriminator() # differentiates real photos and generated photos
Since our generators are not trained yet, the generated Monet-esque photo does not show what is expected at this point.
to_monet = monet_generator(example_photo)
plt.subplot(1, 2, 1)
plt.title("Original Photo")
plt.imshow(example_photo[0] * 0.5 + 0.5)
plt.subplot(1, 2, 2)
plt.title("Monet-esque Photo")
plt.imshow(to_monet[0] * 0.5 + 0.5)
plt.show()
2025-09-26 08:09:48.528255: I external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:473] Loaded cuDNN version 91200 2025-09-26 08:09:49.602223: W external/local_xla/xla/tsl/framework/bfc_allocator.cc:310] Allocator (GPU_0_bfc) ran out of memory trying to allocate 4.31GiB with freed_by_count=0. The caller indicates that this is not a failure, but this may mean that there could be performance gains if more memory were available. 2025-09-26 08:09:49.700433: W external/local_xla/xla/tsl/framework/bfc_allocator.cc:310] Allocator (GPU_0_bfc) ran out of memory trying to allocate 4.66GiB with freed_by_count=0. The caller indicates that this is not a failure, but this may mean that there could be performance gains if more memory were available. 2025-09-26 08:09:49.716800: W external/local_xla/xla/tsl/framework/bfc_allocator.cc:310] Allocator (GPU_0_bfc) ran out of memory trying to allocate 4.66GiB with freed_by_count=0. The caller indicates that this is not a failure, but this may mean that there could be performance gains if more memory were available. 2025-09-26 08:09:49.716849: W external/local_xla/xla/tsl/framework/bfc_allocator.cc:310] Allocator (GPU_0_bfc) ran out of memory trying to allocate 4.66GiB with freed_by_count=0. The caller indicates that this is not a failure, but this may mean that there could be performance gains if more memory were available. 2025-09-26 08:09:49.761526: W external/local_xla/xla/tsl/framework/bfc_allocator.cc:310] Allocator (GPU_0_bfc) ran out of memory trying to allocate 4.31GiB with freed_by_count=0. The caller indicates that this is not a failure, but this may mean that there could be performance gains if more memory were available.
Build the CycleGAN model¶
We will subclass a tf.keras.Model so that we can run fit() later to train our model. During the training step, the model transforms a photo to a Monet painting and then back to a photo. The difference between the original photo and the twice-transformed photo is the cycle-consistency loss. We want the original photo and the twice-transformed photo to be similar to one another.
The losses are defined in the next section.
class CycleGan(keras.Model):
def __init__(
self,
monet_generator,
photo_generator,
monet_discriminator,
photo_discriminator,
lambda_cycle=10,
):
super(CycleGan, self).__init__()
self.m_gen = monet_generator
self.p_gen = photo_generator
self.m_disc = monet_discriminator
self.p_disc = photo_discriminator
self.lambda_cycle = lambda_cycle
def call(self, inputs, training=False):
# For inference, return the monet generator output
return self.m_gen(inputs, training=training)
def compile(
self,
m_gen_optimizer,
p_gen_optimizer,
m_disc_optimizer,
p_disc_optimizer,
gen_loss_fn,
disc_loss_fn,
cycle_loss_fn,
identity_loss_fn
):
super(CycleGan, self).compile()
self.m_gen_optimizer = m_gen_optimizer
self.p_gen_optimizer = p_gen_optimizer
self.m_disc_optimizer = m_disc_optimizer
self.p_disc_optimizer = p_disc_optimizer
self.gen_loss_fn = gen_loss_fn
self.disc_loss_fn = disc_loss_fn
self.cycle_loss_fn = cycle_loss_fn
self.identity_loss_fn = identity_loss_fn
def train_step(self, batch_data):
real_monet, real_photo = batch_data
with tf.GradientTape(persistent=True) as tape:
# photo to monet back to photo
fake_monet = self.m_gen(real_photo, training=True)
cycled_photo = self.p_gen(fake_monet, training=True)
# monet to photo back to monet
fake_photo = self.p_gen(real_monet, training=True)
cycled_monet = self.m_gen(fake_photo, training=True)
# generating itself
same_monet = self.m_gen(real_monet, training=True)
same_photo = self.p_gen(real_photo, training=True)
# discriminator used to check, inputing real images
disc_real_monet = self.m_disc(real_monet, training=True)
disc_real_photo = self.p_disc(real_photo, training=True)
# discriminator used to check, inputing fake images
disc_fake_monet = self.m_disc(fake_monet, training=True)
disc_fake_photo = self.p_disc(fake_photo, training=True)
# evaluates generator loss
monet_gen_loss = self.gen_loss_fn(disc_fake_monet)
photo_gen_loss = self.gen_loss_fn(disc_fake_photo)
# evaluates total cycle consistency loss
total_cycle_loss = self.cycle_loss_fn(real_monet, cycled_monet, self.lambda_cycle) + self.cycle_loss_fn(real_photo, cycled_photo, self.lambda_cycle)
# evaluates total generator loss
total_monet_gen_loss = monet_gen_loss + total_cycle_loss + self.identity_loss_fn(real_monet, same_monet, self.lambda_cycle)
total_photo_gen_loss = photo_gen_loss + total_cycle_loss + self.identity_loss_fn(real_photo, same_photo, self.lambda_cycle)
# evaluates discriminator loss
monet_disc_loss = self.disc_loss_fn(disc_real_monet, disc_fake_monet)
photo_disc_loss = self.disc_loss_fn(disc_real_photo, disc_fake_photo)
# Calculate the gradients for generator and discriminator
monet_generator_gradients = tape.gradient(total_monet_gen_loss,
self.m_gen.trainable_variables)
photo_generator_gradients = tape.gradient(total_photo_gen_loss,
self.p_gen.trainable_variables)
monet_discriminator_gradients = tape.gradient(monet_disc_loss,
self.m_disc.trainable_variables)
photo_discriminator_gradients = tape.gradient(photo_disc_loss,
self.p_disc.trainable_variables)
# Apply the gradients to the optimizer
self.m_gen_optimizer.apply_gradients(zip(monet_generator_gradients,
self.m_gen.trainable_variables))
self.p_gen_optimizer.apply_gradients(zip(photo_generator_gradients,
self.p_gen.trainable_variables))
self.m_disc_optimizer.apply_gradients(zip(monet_discriminator_gradients,
self.m_disc.trainable_variables))
self.p_disc_optimizer.apply_gradients(zip(photo_discriminator_gradients,
self.p_disc.trainable_variables))
return {
"monet_gen_loss": total_monet_gen_loss,
"photo_gen_loss": total_photo_gen_loss,
"monet_disc_loss": monet_disc_loss,
"photo_disc_loss": photo_disc_loss
}
# Create a sample input of the appropriate shape to use before training the model.
# Evaluating the model with this input before training will ensure that the weights are initialised with the correct shape.
sample_input = tf.random.normal([1, 256, 256, 3])
Define loss functions¶
The discriminator loss function below compares real images to a matrix of 1s and fake images to a matrix of 0s. The perfect discriminator will output all 1s for real images and all 0s for fake images. The discriminator loss outputs the average of the real and generated loss.
with strategy.scope():
def discriminator_loss(real, generated):
real_loss = tf.keras.losses.BinaryCrossentropy(from_logits=True, reduction=tf.keras.losses.Reduction.NONE)(tf.ones_like(real), real)
generated_loss = tf.keras.losses.BinaryCrossentropy(from_logits=True, reduction=tf.keras.losses.Reduction.NONE)(tf.zeros_like(generated), generated)
total_disc_loss = real_loss + generated_loss
return total_disc_loss * 0.5
The generator wants to fool the discriminator into thinking the generated image is real. The perfect generator will have the discriminator output only 1s. Thus, it compares the generated image to a matrix of 1s to find the loss.
with strategy.scope():
def generator_loss(generated):
return tf.keras.losses.BinaryCrossentropy(from_logits=True, reduction=tf.keras.losses.Reduction.NONE)(tf.ones_like(generated), generated)
We want our original photo and the twice transformed photo to be similar to one another. Thus, we can calculate the cycle consistency loss be finding the average of their difference.
with strategy.scope():
def calc_cycle_loss(real_image, cycled_image, LAMBDA):
loss1 = tf.reduce_mean(tf.abs(real_image - cycled_image))
return LAMBDA * loss1
The identity loss compares the image with its generator (i.e. photo with photo generator). If given a photo as input, we want it to generate the same image as the image was originally a photo. The identity loss compares the input with the output of the generator.
with strategy.scope():
def identity_loss(real_image, same_image, LAMBDA):
loss = tf.reduce_mean(tf.abs(real_image - same_image))
return LAMBDA * 0.5 * loss
Train the CycleGAN¶
Let's compile our model. Since we used tf.keras.Model to build our CycleGAN, we can just ude the fit function to train our model.
with strategy.scope():
monet_generator_optimizer = tf.keras.optimizers.Adam(2e-4, beta_1=0.5)
photo_generator_optimizer = tf.keras.optimizers.Adam(2e-4, beta_1=0.5)
monet_discriminator_optimizer = tf.keras.optimizers.Adam(2e-4, beta_1=0.5)
photo_discriminator_optimizer = tf.keras.optimizers.Adam(2e-4, beta_1=0.5)
with strategy.scope():
cycle_gan_model = CycleGan(
monet_generator, photo_generator, monet_discriminator, photo_discriminator
)
cycle_gan_model.compile(
m_gen_optimizer = monet_generator_optimizer,
p_gen_optimizer = photo_generator_optimizer,
m_disc_optimizer = monet_discriminator_optimizer,
p_disc_optimizer = photo_discriminator_optimizer,
gen_loss_fn = generator_loss,
disc_loss_fn = discriminator_loss,
cycle_loss_fn = calc_cycle_loss,
identity_loss_fn = identity_loss
)
# Build the model by calling it with a sample input
_ = cycle_gan_model(sample_input, training=False)
cycle_gan_model.fit(
tf.data.Dataset.zip((monet_ds, photo_ds)),
epochs=10
)
#cycle_gan_model.m_gen.save("models/cycle_gan_generator10.keras")
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
Epoch 1/10
E0000 00:00:1758863417.924940 1662340 meta_optimizer.cc:967] layout failed: INVALID_ARGUMENT: Size of values 0 does not match size of permutation 4 @ fanin shape inStatefulPartitionedCall/functional_31_5/sequential_23_1/dropout_3_1/stateless_dropout/SelectV2-2-TransposeNHWCToNCHW-LayoutOptimizer 2025-09-26 08:10:22.648098: W external/local_xla/xla/tsl/framework/bfc_allocator.cc:310] Allocator (GPU_0_bfc) ran out of memory trying to allocate 4.16GiB with freed_by_count=0. The caller indicates that this is not a failure, but this may mean that there could be performance gains if more memory were available. 2025-09-26 08:10:23.856217: W external/local_xla/xla/tsl/framework/bfc_allocator.cc:310] Allocator (GPU_0_bfc) ran out of memory trying to allocate 4.16GiB with freed_by_count=0. The caller indicates that this is not a failure, but this may mean that there could be performance gains if more memory were available. 2025-09-26 08:10:23.856258: W external/local_xla/xla/tsl/framework/bfc_allocator.cc:310] Allocator (GPU_0_bfc) ran out of memory trying to allocate 4.16GiB with freed_by_count=0. The caller indicates that this is not a failure, but this may mean that there could be performance gains if more memory were available. 2025-09-26 08:10:23.952305: W external/local_xla/xla/tsl/framework/bfc_allocator.cc:310] Allocator (GPU_0_bfc) ran out of memory trying to allocate 4.16GiB with freed_by_count=0. The caller indicates that this is not a failure, but this may mean that there could be performance gains if more memory were available. 2025-09-26 08:10:23.976845: W external/local_xla/xla/tsl/framework/bfc_allocator.cc:310] Allocator (GPU_0_bfc) ran out of memory trying to allocate 4.16GiB with freed_by_count=0. The caller indicates that this is not a failure, but this may mean that there could be performance gains if more memory were available. 2025-09-26 08:10:24.728736: W external/local_xla/xla/tsl/framework/bfc_allocator.cc:382] Garbage collection: deallocate free memory regions (i.e., allocations) so that we can re-allocate a larger region to avoid OOM due to memory fragmentation. If you see this message frequently, you are running near the threshold of the available device memory and re-allocation may incur great performance overhead. You may try smaller batch sizes to observe the performance impact. Set TF_ENABLE_GPU_GARBAGE_COLLECTION=false if you'd like to disable this feature.
300/Unknown 211s 586ms/step - monet_disc_loss: 0.7447 - monet_gen_loss: 6.2703 - photo_disc_loss: 0.7402 - photo_gen_loss: 6.5054INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',). INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',). 300/300 ━━━━━━━━━━━━━━━━━━━━ 211s 586ms/step - monet_disc_loss: 0.9167 - monet_gen_loss: 3.4670 - photo_disc_loss: 0.9082 - photo_gen_loss: 3.7883 INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',). INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',). Epoch 2/10
2025-09-26 08:13:21.088214: I tensorflow/core/framework/local_rendezvous.cc:430] Local rendezvous send item cancelled. Key hash: 3886017838137081218 2025-09-26 08:13:21.088238: I tensorflow/core/framework/local_rendezvous.cc:430] Local rendezvous send item cancelled. Key hash: 15461980211571385677 2025-09-26 08:13:21.088246: I tensorflow/core/framework/local_rendezvous.cc:430] Local rendezvous send item cancelled. Key hash: 7922215523901165045 2025-09-26 08:13:21.088251: I tensorflow/core/framework/local_rendezvous.cc:430] Local rendezvous send item cancelled. Key hash: 13724166269620415335 /home/farzat/files/University/Colorado/courses/csca5632=unsupervised-algorithms-in-machine-learning/reviews/venv/lib/python3.13/site-packages/keras/src/trainers/epoch_iterator.py:164: UserWarning: Your input ran out of data; interrupting training. Make sure that your dataset or generator can generate at least `steps_per_epoch * epochs` batches. You may need to use the `.repeat()` function when building your dataset. self._interrupted_warning()
300/300 ━━━━━━━━━━━━━━━━━━━━ 0s 641ms/step - monet_disc_loss: 0.6506 - monet_gen_loss: 4.2117 - photo_disc_loss: 0.6692 - photo_gen_loss: 4.2007INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',). INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',). 300/300 ━━━━━━━━━━━━━━━━━━━━ 192s 641ms/step - monet_disc_loss: 0.9417 - monet_gen_loss: 2.5165 - photo_disc_loss: 0.8053 - photo_gen_loss: 2.9121 INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',). INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',). Epoch 3/10 299/300 ━━━━━━━━━━━━━━━━━━━━ 0s 678ms/step - monet_disc_loss: 0.6395 - monet_gen_loss: 3.6317 - photo_disc_loss: 0.6652 - photo_gen_loss: 3.5706
2025-09-26 08:19:56.199859: I tensorflow/core/framework/local_rendezvous.cc:407] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
[[{{node MultiDeviceIteratorGetNextFromShard}}]]
[[RemoteCall]]
300/300 ━━━━━━━━━━━━━━━━━━━━ 203s 678ms/step - monet_disc_loss: 0.8465 - monet_gen_loss: 1.9604 - photo_disc_loss: 0.8237 - photo_gen_loss: 2.4006 Epoch 4/10
2025-09-26 08:19:56.902995: I tensorflow/core/framework/local_rendezvous.cc:430] Local rendezvous send item cancelled. Key hash: 3886017838137081218 2025-09-26 08:19:56.903023: I tensorflow/core/framework/local_rendezvous.cc:430] Local rendezvous send item cancelled. Key hash: 11451226824506368005 2025-09-26 08:19:56.903027: I tensorflow/core/framework/local_rendezvous.cc:430] Local rendezvous send item cancelled. Key hash: 14720316998612787575 2025-09-26 08:19:56.903030: I tensorflow/core/framework/local_rendezvous.cc:430] Local rendezvous send item cancelled. Key hash: 15461980211571385677 2025-09-26 08:19:56.903033: I tensorflow/core/framework/local_rendezvous.cc:430] Local rendezvous send item cancelled. Key hash: 2151204701166336993 2025-09-26 08:19:56.903036: I tensorflow/core/framework/local_rendezvous.cc:430] Local rendezvous send item cancelled. Key hash: 18133501999268272969 2025-09-26 08:19:56.903039: I tensorflow/core/framework/local_rendezvous.cc:430] Local rendezvous send item cancelled. Key hash: 7922215523901165045 2025-09-26 08:19:56.903043: I tensorflow/core/framework/local_rendezvous.cc:430] Local rendezvous send item cancelled. Key hash: 13724166269620415335 2025-09-26 08:19:56.903046: I tensorflow/core/framework/local_rendezvous.cc:430] Local rendezvous send item cancelled. Key hash: 9747323360813469605
300/300 ━━━━━━━━━━━━━━━━━━━━ 216s 719ms/step - monet_disc_loss: 0.8124 - monet_gen_loss: 1.9616 - photo_disc_loss: 0.8436 - photo_gen_loss: 2.3075 Epoch 5/10
2025-09-26 08:23:32.595364: I tensorflow/core/framework/local_rendezvous.cc:430] Local rendezvous send item cancelled. Key hash: 3886017838137081218 2025-09-26 08:23:32.595385: I tensorflow/core/framework/local_rendezvous.cc:430] Local rendezvous send item cancelled. Key hash: 11451226824506368005 2025-09-26 08:23:32.595389: I tensorflow/core/framework/local_rendezvous.cc:430] Local rendezvous send item cancelled. Key hash: 14720316998612787575 2025-09-26 08:23:32.595392: I tensorflow/core/framework/local_rendezvous.cc:430] Local rendezvous send item cancelled. Key hash: 15461980211571385677 2025-09-26 08:23:32.595396: I tensorflow/core/framework/local_rendezvous.cc:430] Local rendezvous send item cancelled. Key hash: 2151204701166336993 2025-09-26 08:23:32.595400: I tensorflow/core/framework/local_rendezvous.cc:430] Local rendezvous send item cancelled. Key hash: 18133501999268272969 2025-09-26 08:23:32.595404: I tensorflow/core/framework/local_rendezvous.cc:430] Local rendezvous send item cancelled. Key hash: 7922215523901165045 2025-09-26 08:23:32.595407: I tensorflow/core/framework/local_rendezvous.cc:430] Local rendezvous send item cancelled. Key hash: 13724166269620415335 2025-09-26 08:23:32.595411: I tensorflow/core/framework/local_rendezvous.cc:430] Local rendezvous send item cancelled. Key hash: 9747323360813469605
300/300 ━━━━━━━━━━━━━━━━━━━━ 213s 711ms/step - monet_disc_loss: 0.8258 - monet_gen_loss: 1.8290 - photo_disc_loss: 0.8467 - photo_gen_loss: 2.2254 Epoch 6/10
2025-09-26 08:27:05.962264: I tensorflow/core/framework/local_rendezvous.cc:430] Local rendezvous send item cancelled. Key hash: 3886017838137081218 2025-09-26 08:27:05.962293: I tensorflow/core/framework/local_rendezvous.cc:430] Local rendezvous send item cancelled. Key hash: 11451226824506368005 2025-09-26 08:27:05.962297: I tensorflow/core/framework/local_rendezvous.cc:430] Local rendezvous send item cancelled. Key hash: 14720316998612787575 2025-09-26 08:27:05.962299: I tensorflow/core/framework/local_rendezvous.cc:430] Local rendezvous send item cancelled. Key hash: 15461980211571385677 2025-09-26 08:27:05.962303: I tensorflow/core/framework/local_rendezvous.cc:430] Local rendezvous send item cancelled. Key hash: 18133501999268272969 2025-09-26 08:27:05.962306: I tensorflow/core/framework/local_rendezvous.cc:430] Local rendezvous send item cancelled. Key hash: 7922215523901165045 2025-09-26 08:27:05.962309: I tensorflow/core/framework/local_rendezvous.cc:430] Local rendezvous send item cancelled. Key hash: 13724166269620415335 2025-09-26 08:27:05.962312: I tensorflow/core/framework/local_rendezvous.cc:430] Local rendezvous send item cancelled. Key hash: 9747323360813469605
300/300 ━━━━━━━━━━━━━━━━━━━━ 210s 700ms/step - monet_disc_loss: 0.8000 - monet_gen_loss: 1.8406 - photo_disc_loss: 0.8479 - photo_gen_loss: 2.2312 Epoch 7/10
2025-09-26 08:30:35.984817: I tensorflow/core/framework/local_rendezvous.cc:430] Local rendezvous send item cancelled. Key hash: 3886017838137081218 2025-09-26 08:30:35.984855: I tensorflow/core/framework/local_rendezvous.cc:430] Local rendezvous send item cancelled. Key hash: 11451226824506368005 2025-09-26 08:30:35.984864: I tensorflow/core/framework/local_rendezvous.cc:430] Local rendezvous send item cancelled. Key hash: 14720316998612787575 2025-09-26 08:30:35.984872: I tensorflow/core/framework/local_rendezvous.cc:430] Local rendezvous send item cancelled. Key hash: 15461980211571385677 2025-09-26 08:30:35.984881: I tensorflow/core/framework/local_rendezvous.cc:430] Local rendezvous send item cancelled. Key hash: 2151204701166336993 2025-09-26 08:30:35.984890: I tensorflow/core/framework/local_rendezvous.cc:430] Local rendezvous send item cancelled. Key hash: 18133501999268272969 2025-09-26 08:30:35.984899: I tensorflow/core/framework/local_rendezvous.cc:430] Local rendezvous send item cancelled. Key hash: 7922215523901165045 2025-09-26 08:30:35.984907: I tensorflow/core/framework/local_rendezvous.cc:430] Local rendezvous send item cancelled. Key hash: 13724166269620415335 2025-09-26 08:30:35.984916: I tensorflow/core/framework/local_rendezvous.cc:430] Local rendezvous send item cancelled. Key hash: 9747323360813469605
300/300 ━━━━━━━━━━━━━━━━━━━━ 215s 717ms/step - monet_disc_loss: 0.8808 - monet_gen_loss: 1.8153 - photo_disc_loss: 0.8339 - photo_gen_loss: 2.2438 Epoch 8/10
2025-09-26 08:34:11.127664: I tensorflow/core/framework/local_rendezvous.cc:430] Local rendezvous send item cancelled. Key hash: 3886017838137081218 2025-09-26 08:34:11.127692: I tensorflow/core/framework/local_rendezvous.cc:430] Local rendezvous send item cancelled. Key hash: 11451226824506368005 2025-09-26 08:34:11.127698: I tensorflow/core/framework/local_rendezvous.cc:430] Local rendezvous send item cancelled. Key hash: 14720316998612787575 2025-09-26 08:34:11.127704: I tensorflow/core/framework/local_rendezvous.cc:430] Local rendezvous send item cancelled. Key hash: 15461980211571385677 2025-09-26 08:34:11.127711: I tensorflow/core/framework/local_rendezvous.cc:430] Local rendezvous send item cancelled. Key hash: 2151204701166336993 2025-09-26 08:34:11.127716: I tensorflow/core/framework/local_rendezvous.cc:430] Local rendezvous send item cancelled. Key hash: 18133501999268272969 2025-09-26 08:34:11.127722: I tensorflow/core/framework/local_rendezvous.cc:430] Local rendezvous send item cancelled. Key hash: 7922215523901165045 2025-09-26 08:34:11.127727: I tensorflow/core/framework/local_rendezvous.cc:430] Local rendezvous send item cancelled. Key hash: 13724166269620415335 2025-09-26 08:34:11.127733: I tensorflow/core/framework/local_rendezvous.cc:430] Local rendezvous send item cancelled. Key hash: 9747323360813469605
300/300 ━━━━━━━━━━━━━━━━━━━━ 219s 729ms/step - monet_disc_loss: 0.8937 - monet_gen_loss: 1.6926 - photo_disc_loss: 0.8292 - photo_gen_loss: 2.1727 Epoch 9/10
2025-09-26 08:37:49.872236: I tensorflow/core/framework/local_rendezvous.cc:407] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
[[{{function_node cond_false_10617}}{{node cond/IteratorGetNext}}]]
2025-09-26 08:37:49.872267: I tensorflow/core/framework/local_rendezvous.cc:430] Local rendezvous send item cancelled. Key hash: 3886017838137081218
2025-09-26 08:37:49.872274: I tensorflow/core/framework/local_rendezvous.cc:430] Local rendezvous send item cancelled. Key hash: 11451226824506368005
2025-09-26 08:37:49.872280: I tensorflow/core/framework/local_rendezvous.cc:430] Local rendezvous send item cancelled. Key hash: 14720316998612787575
2025-09-26 08:37:49.872286: I tensorflow/core/framework/local_rendezvous.cc:430] Local rendezvous send item cancelled. Key hash: 15461980211571385677
2025-09-26 08:37:49.872292: I tensorflow/core/framework/local_rendezvous.cc:430] Local rendezvous send item cancelled. Key hash: 7922215523901165045
2025-09-26 08:37:49.872297: I tensorflow/core/framework/local_rendezvous.cc:430] Local rendezvous send item cancelled. Key hash: 13724166269620415335
300/300 ━━━━━━━━━━━━━━━━━━━━ 215s 718ms/step - monet_disc_loss: 0.8772 - monet_gen_loss: 1.6852 - photo_disc_loss: 0.8114 - photo_gen_loss: 2.1598 Epoch 10/10
2025-09-26 08:41:25.351945: I tensorflow/core/framework/local_rendezvous.cc:430] Local rendezvous send item cancelled. Key hash: 3886017838137081218 2025-09-26 08:41:25.351984: I tensorflow/core/framework/local_rendezvous.cc:430] Local rendezvous send item cancelled. Key hash: 11451226824506368005 2025-09-26 08:41:25.351994: I tensorflow/core/framework/local_rendezvous.cc:430] Local rendezvous send item cancelled. Key hash: 15461980211571385677 2025-09-26 08:41:25.352005: I tensorflow/core/framework/local_rendezvous.cc:430] Local rendezvous send item cancelled. Key hash: 7922215523901165045 2025-09-26 08:41:25.352013: I tensorflow/core/framework/local_rendezvous.cc:430] Local rendezvous send item cancelled. Key hash: 13724166269620415335
300/300 ━━━━━━━━━━━━━━━━━━━━ 210s 700ms/step - monet_disc_loss: 0.8353 - monet_gen_loss: 1.7264 - photo_disc_loss: 0.8049 - photo_gen_loss: 2.1495
2025-09-26 08:44:55.433774: I tensorflow/core/framework/local_rendezvous.cc:430] Local rendezvous send item cancelled. Key hash: 3886017838137081218 2025-09-26 08:44:55.433808: I tensorflow/core/framework/local_rendezvous.cc:430] Local rendezvous send item cancelled. Key hash: 11451226824506368005 2025-09-26 08:44:55.433819: I tensorflow/core/framework/local_rendezvous.cc:430] Local rendezvous send item cancelled. Key hash: 15461980211571385677 2025-09-26 08:44:55.433828: I tensorflow/core/framework/local_rendezvous.cc:430] Local rendezvous send item cancelled. Key hash: 7922215523901165045 2025-09-26 08:44:55.433836: I tensorflow/core/framework/local_rendezvous.cc:430] Local rendezvous send item cancelled. Key hash: 13724166269620415335
<keras.src.callbacks.history.History at 0x7f8d60f72f90>
Visualize our Monet-esque photos¶
_, ax = plt.subplots(5, 2, figsize=(12, 12))
for i, img in enumerate(photo_ds.take(5)):
prediction = monet_generator(img, training=False)[0].numpy()
prediction = (prediction * 127.5 + 127.5).astype(np.uint8)
img = (img[0] * 127.5 + 127.5).numpy().astype(np.uint8)
ax[i, 0].imshow(img)
ax[i, 1].imshow(prediction)
ax[i, 0].set_title("Input Photo")
ax[i, 1].set_title("Monet-esque")
ax[i, 0].axis("off")
ax[i, 1].axis("off")
plt.show()
CycleGAN vs Styled CycleGAN¶
CycleGAN¶
The above model was used basically as-is from the tutorial. It used three categories of loss functions:
- Adversarial losses: Do the photos/Monet images look real? This encompasses the generator losses of both generators, as well as their respective discriminator losses.
- Can I convert it back? (Remember the translator analogy). This is accounted for by the total_cycle_loss.
- Identity loss: This helps preserve the colour and tone of the image. When the monet_generator is given a Monet image, the same image is expected at the output. When the photo_generator is given a photo, the same photo is expected as well.
Styled CycleGAN¶
You might have sensed a problem with the above model - the output images are hardly different from the input photos. If the goal is to make them have Monet-esque style, then it is failing horribly. You might say that this is because it was trained for only 10 epochs, but as we will see below, the problem persists even with 50 epochs.
Therefore, I decided to force the hand of the generator by adding a new loss: the style loss. The style loss is calculated by extracting the texture and style from the real Monet and generated Monet and comparing them to each other.
How is this achieved? To do this, a pre-trained model called VGG19 is used. The layers of VGG19 can be classified as:
- Shallow layers: These capture low-level features such as colours, edges, brush strokes, etc.
- Deep layers: These learn the high-level features - the actual contents of the image.
Given that, transferring the style can be achieved by adding two losses. The first is the style loss. A special matrix called the Gram matrix co-occurrences between the different features in a layer to give us a fingerprint of the style. Comparing this fingerprint between the real Monet and generated Monet across the shallow layers gives us the style loss.
To make sure the content is not lost due to the introduction of the style loss, an additional content loss is added, which compares the features of the original photo and generated Monet using the deep layers of the VGG19 model. The combination of both style loss and content loss ensure that only the style is transferred while the content is preserved.
Note that the use of VGG19 is a form of transfer learning, a concept we learned about in the last lecture of module 3. It is used as a fixed feature extractor, and only a part of the layers is used. This is appropriate as the new dataset is relatively small and not very different from the original dataset.
Build the Styled CycleGAN model¶
class StyledCycleGAN(CycleGan):
def __init__(
self,
monet_generator,
photo_generator,
monet_discriminator,
photo_discriminator,
lambda_cycle=10,
lambda_style=1.0,
lambda_content=0.5,
**kwargs
):
super().__init__(
monet_generator, photo_generator,
monet_discriminator, photo_discriminator,
lambda_cycle, **kwargs
)
self.lambda_style = lambda_style
self.lambda_content = lambda_content
# Pre-trained VGG for perceptual losses
self.vgg = self.build_vgg_feature_extractor()
def build_vgg_feature_extractor(self):
vgg = tf.keras.applications.VGG19(include_top=False, weights='imagenet')
vgg.trainable = False
# Extract features from specific layers
style_layers = ['block1_conv1', 'block2_conv1', 'block3_conv1', 'block4_conv1', 'block5_conv1']
content_layers = ['block4_conv2']
outputs = [vgg.get_layer(name).output for name in style_layers + content_layers]
return tf.keras.Model(vgg.input, outputs)
def gram_matrix(self, input_tensor):
result = tf.linalg.einsum('bijc,bijd->bcd', input_tensor, input_tensor)
input_shape = tf.shape(input_tensor)
num_locations = tf.cast(input_shape[1] * input_shape[2], tf.float32)
return result / num_locations
def style_loss(self, real, generated):
real_features = self.vgg(real)
generated_features = self.vgg(generated)
style_loss = 0
for real_feat, gen_feat in zip(real_features[:5], generated_features[:5]):
real_gram = self.gram_matrix(real_feat)
gen_gram = self.gram_matrix(gen_feat)
style_loss += tf.reduce_mean(tf.abs(real_gram - gen_gram))
return style_loss / 5
def content_loss(self, real, generated):
real_features = self.vgg(real)
generated_features = self.vgg(generated)
content_loss = 0
for real_feat, gen_feat in zip(real_features[5:], generated_features[5:]):
content_loss += tf.reduce_mean(tf.abs(real_feat - gen_feat))
return content_loss
def train_step(self, batch_data):
real_monet, real_photo = batch_data
with tf.GradientTape(persistent=True) as tape:
# Standard CycleGAN forward pass
fake_monet = self.m_gen(real_photo, training=True)
cycled_photo = self.p_gen(fake_monet, training=True)
fake_photo = self.p_gen(real_monet, training=True)
cycled_monet = self.m_gen(fake_photo, training=True)
# Additional artistic losses
style_loss = self.style_loss(real_monet, fake_monet)
content_loss = self.content_loss(real_photo, fake_monet)
# Standard CycleGAN losses
disc_real_monet = self.m_disc(real_monet, training=True)
disc_real_photo = self.p_disc(real_photo, training=True)
disc_fake_monet = self.m_disc(fake_monet, training=True)
disc_fake_photo = self.p_disc(fake_photo, training=True)
monet_gen_loss = self.gen_loss_fn(disc_fake_monet)
photo_gen_loss = self.gen_loss_fn(disc_fake_photo)
total_cycle_loss = self.cycle_loss_fn(real_monet, cycled_monet, self.lambda_cycle) + \
self.cycle_loss_fn(real_photo, cycled_photo, self.lambda_cycle)
# Enhanced generator losses with artistic terms
total_monet_gen_loss = (monet_gen_loss + total_cycle_loss +
self.lambda_style * style_loss +
self.lambda_content * content_loss)
total_photo_gen_loss = photo_gen_loss + total_cycle_loss
monet_disc_loss = self.disc_loss_fn(disc_real_monet, disc_fake_monet)
photo_disc_loss = self.disc_loss_fn(disc_real_photo, disc_fake_photo)
# Calculate gradients
monet_gen_gradients = tape.gradient(total_monet_gen_loss, self.m_gen.trainable_variables)
photo_gen_gradients = tape.gradient(total_photo_gen_loss, self.p_gen.trainable_variables)
monet_disc_gradients = tape.gradient(monet_disc_loss, self.m_disc.trainable_variables)
photo_disc_gradients = tape.gradient(photo_disc_loss, self.p_disc.trainable_variables)
# Apply gradients
self.m_gen_optimizer.apply_gradients(zip(monet_gen_gradients, self.m_gen.trainable_variables))
self.p_gen_optimizer.apply_gradients(zip(photo_gen_gradients, self.p_gen.trainable_variables))
self.m_disc_optimizer.apply_gradients(zip(monet_disc_gradients, self.m_disc.trainable_variables))
self.p_disc_optimizer.apply_gradients(zip(photo_disc_gradients, self.p_disc.trainable_variables))
return {
"monet_gen_loss": total_monet_gen_loss,
"photo_gen_loss": total_photo_gen_loss,
"monet_disc_loss": monet_disc_loss,
"photo_disc_loss": photo_disc_loss,
"style_loss": style_loss,
"content_loss": content_loss
}
Train the Styled CycleGAN model¶
Compare Results from All Models¶
The above models were trained with epochs ranging from 10 to 50. The results of each CycleGAN and Styled CycleGAN are displayed for comparison, and the original photos are included for reference.
def compare_models(test_photos, models_dict):
num_models = len(models_dict) + 1
num_examples = 4 # Just use a fixed number of examples
rows = plt.figure(layout="constrained", figsize=(3 * num_examples, 3 * num_models)).subfigures(num_models, 1)
rows[0].suptitle("Original Photos", fontsize='xx-large')
for ax, img in zip(rows[0].subplots(1, num_examples), test_photos.take(num_examples)):
# Original photo
original = (img[0] * 127.5 + 127.5).numpy().astype(np.uint8)
ax.imshow(original)
ax.axis('off')
for row, (model_name, model_path) in zip(rows[1:], models_dict.items()):
model = tf.keras.models.load_model(model_path)
row.suptitle(model_name, fontsize='xx-large')
for ax, img in zip(row.subplots(1, num_examples), test_photos.take(num_examples)):
prediction = model(img, training=False)[0].numpy()
prediction = (prediction * 127.5 + 127.5).astype(np.uint8)
ax.imshow(prediction)
ax.axis('off')
# plt.tight_layout()
plt.show()
# Compare all models
models_to_compare = {
"CycleGAN, 10 epochs": "models/cycle_gan_generator10.keras",
"CycleGAN, 20 epochs": "models/cycle_gan_generator20.keras",
"CycleGAN, 30 epochs": "models/cycle_gan_generator30.keras",
"CycleGAN, 40 epochs": "models/cycle_gan_generator40.keras",
"CycleGAN, 50 epochs": "models/cycle_gan_generator50.keras",
"Styled CycleGAN, 10 epochs": "models/styled_generator10.keras",
"Styled CycleGAN, 20 epochs": "models/styled_generator20.keras",
"Styled CycleGAN, 30 epochs": "models/styled_generator30.keras",
"Styled CycleGAN, 40 epochs": "models/styled_generator40.keras",
"Styled CycleGAN, 50 epochs": "models/styled_generator50.keras",
}
compare_models(photo_ds, models_to_compare)
Analysis 1¶
Looking at the plain CycleGAN, we see that some weak patterns appear in the images, but only at epoch 50. The pattern is very weak, and the output can hardly be distinguished from the original image.
On the other hand, the same pattern is very visible in StyledGAN from the just 10 epochs of training. However, at 30 epochs, the colour gets distorted and the original objects are difficult to recognize. At 40 epochs, the colours are fixed, and the images do look more painting-like but with an extra blur. At 50 epochs, the blur is less severe and some of the images do look like a proper Monet painting.
Problem¶
Styled CycleGAN is giving better results but there is distortion in the images.
Hyperparameter tuning¶
Try training the model, but change the dropout rate from 0.5 to 0.3 or 0.7.
Compare Results from the hyperparameter tuning¶
The Styled CycleGAN model was trained with dropouts of 0.3 and 0.7, each with 40 and 50 epochs.
# Compare hyperparameter tuning results
models_to_compare = {
"Styled CycleGAN, 40 epochs, dropout=.3": "models/style_low_dropout40.keras",
"Styled CycleGAN, 50 epochs, dropout=.3": "models/style_low_dropout50.keras",
"Styled CycleGAN, 40 epochs, dropout=.7": "models/style_high_dropout40.keras",
"Styled CycleGAN, 50 epochs, dropout=.7": "models/style_high_dropout50.keras",
}
compare_models(photo_ds, models_to_compare)
Analysis 2¶
Lowering the dropout seems to have been devastating. The images are even less recognizable than the 0.5 dropout version at 30 epochs. On the other hand, at a dropout of .7, the images do seem clean, but they seem to have lost some of the Monet-esque feel.
Create submission file with best model (Styled CycleGAN)¶
Submission results¶
I submitted different configurations of the model to Kaggle. Below are tables of the MiFID scores (lower = better):
| Configuration | CycleGAN | Styled CycleGAN |
|---|---|---|
| 50 epochs, dropout=.5 | 70.76 | 59.90 |
| Configuration | 40 epochs | 50 epochs |
|---|---|---|
| Styled CycleGAN, dropout=.5 | 92.25 | 59.90 |
| Configuration | dropout=.3 | dropout=.5 | dropout=.7 |
|---|---|---|---|
| Styled CycleGAN, 50 epochs | 76.75 | 59.90 | 100.65 |
Conclusion¶
From the above results, the best configuration is: Styled CycleGAN, dropout=0.5, epochs=50.
Styled CycleGAN worked better than normal CycleGAN. This shows that the addition of the style_loss and content_loss did indeed help with the style transfer.
One thing that did not work though is moderating the pattern so it does not look like noise. Adjusting the dropout did not help. In fact, if it showed one thing, it is that visual clearness does not guarantee a better MiFID score.
A possible reason why the models were struggling with freely imitating the Monet style is the need to keep close to the original image. I suspect cycle_loss and identity_loss as likely culprits. Had I had a better GPU, I would have likely tried reducing their weights and see if that helps.
Another approach which could have been successful is replacing CycleGAN entirely. The Kaggle competition does not require the Monet-style images to be generated from the provided photos. It would have been interesting to see if ignoring the provided photos would have given better results or not.
Citations¶
The tutorial recommended by Kaggle
- Link: Amy Jang's notebook
Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks
- Authors: Zhu et al. (2017)
- Link: arXiv:1703.10593
Perceptual Losses for Real-Time Style Transfer and Super-Resolution
- Authors: Johnson et al. (2016)
- Link: arXiv:1603.08155
A Neural Algorithm of Artistic Style
- Authors: Gatys et al. (2015)
- Link: arXiv:1508.06576
Neural Style Transfer from Scratch: A Deep Dive Using VGG19 and Gram Matrices in PyTorch