Understanding Variational Autoencoders (VAE)ΒΆ

Emilie Naples - Master in Big Data AnalyticsΒΆ


In this notebook, I train a VAE for a dataset of images of celebrities: CelebA. I will use a reduced set of 11.000 images to speed things up.

Probabilistic modelΒΆ

If $\mathbf{x}^{(i)}$ are the training images and $\mathbf{z}^{(i)}$ are the corresponding latent (unobserved) embeddings of dimension $k$, $i=1,\ldots,11.000$. Then ...

\begin{align} p(\mathbf{z}) &= \mathcal{N}(\mathbf{0},\mathbf{I})\\ p(\mathbf{x}|\mathbf{z}) &= \mathcal{N}(\mu_\theta(\mathbf{z}),\sigma_x\mathbf{I}) \end{align} where $\mu_\theta(\mathbf{z})$ is a Neural Network with input $\mathbf{z}$ and $\sigma_x$ is a reconstruction noise power that we won't train. By default, we will take the value $\sigma_x=0.1$.

Inference modelΒΆ

The goal is to approximate the posterior distribution $p(\mathbf{z}|\mathbf{x})$ by the following parametric model

$$p(\mathbf{z}|\mathbf{x})\approx q(\mathbf{z}|\mathbf{x}) = \mathcal{N}(\mu_\eta(\mathbf{x}),\text{diag}(\sigma_\eta(\mathbf{x}))$$

where both $\mu_\eta(\mathbf{x}),\text{diag}(\sigma_\eta(\mathbf{x})$ are neural networks with input $\mathbf{x}$.

Training loss: the evidence lower bound (ELBO)ΒΆ

Since exact inference is intractable in the former model, train the parameters of the probabilistic and inference models using the ELBO lower bound

$$\log p(\mathbf{x}) \geq \mathbb{E}_{q(\mathbf{z}|\mathbf{x})}\left[\log p(\mathbf{x}|\mathbf{z})\right] - D_{KL}(q(\mathbf{z}|\mathbf{x}||p(\mathbf{z}))$$

Since $p(\mathbf{z})=\mathcal{N}(\mathbf{0},\mathbf{I})$ and $q(\mathbf{z}|\mathbf{x})$ are both Gaussian then

$$D_{KL}(q(\mathbf{z}|\mathbf{x}||p(\mathbf{z})) = \frac{1}{2}\left(-k+\sum_{j=1}^k \sigma_{\eta,j}-\log(\sigma_{\eta,j})+\mu_{\eta,j}^2\right)$$

During training, the first expectation will be replaced by a one-sample Monte Carlo estimate for every data point. Overall, the ELBO to be optimized using SGD is

$$\text{ELBO}(\theta,\eta,\sigma_x) = \frac{1}{N} \sum_{i=1}^{N} \left(\log p(\mathbf{x}^{(i)}|\tilde{\mathbf{z}}^{(i)}) - D_{KL}(q(\mathbf{z}|\mathbf{x}^{(i)}||p(\mathbf{z}))\right)$$

where $\tilde{\mathbf{z}}^{(i)}$ is a sample from $q(\mathbf{z}|\mathbf{x}^{(i)})$ computed as

$$\tilde{\mathbf{z}}^{(i)} = \mu_\eta(\mathbf{x}) + \sqrt{\sigma_\eta(\mathbf{x})} \epsilon$$ and $\epsilon$ is a sample from a k-dimensional standard Gaussian distribution. This is the reparameterization trick. The samples are differentiable w.r.t. the parameters in both $\mu_\eta(\mathbf{x})$ and $\sigma_\eta(\mathbf{x})$.

InΒ [Β ]:
import torch
from torch import nn
import numpy as np
import torchvision
from torchvision import datasets
import torchvision.transforms as transforms
from torch import optim
import time

import matplotlib.pyplot as plt
%matplotlib inline
%config InlineBackend.figure_format = 'retina'

Part I: Loading the reduced CelebA datasetΒΆ

Connect to Google Drive

InΒ [Β ]:
from google.colab import drive
drive.mount('/content/drive')
Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
InΒ [Β ]:
path_to_folder = '/content/drive/My Drive/MM + PPCA/reduced_celebA/'

The following function imports the database using Pytorch dataloaders ...

InΒ [Β ]:
def get_celeba(batch_size, dataset_directory):

    train_transformation = transforms.Compose([
        transforms.Resize((64, 64)),        # THE IMAGES ARE RE-SCALED TO 64x64
        transforms.ToTensor(),
        transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),
    ])
    train_dataset = torchvision.datasets.ImageFolder(dataset_directory + 'celeba', train_transformation)


    # Use sampler for randomization
    training_sampler = torch.utils.data.SubsetRandomSampler(range(len(train_dataset)))

    # Prepare Data Loaders for training and validation
    trainloader = torch.utils.data.DataLoader(train_dataset, batch_size=batch_size, sampler=training_sampler)

    return trainloader
InΒ [Β ]:
# load the database in a single trainloader
trainloader = get_celeba(32,path_to_folder)

Read one mini-batch of images and plot some of them.

InΒ [Β ]:
def imshow(img):
    img = img / 2 + 0.5     # unnormalize to plot
    npimg = img.numpy()
    plt.imshow(np.transpose(npimg, (1, 2, 0)))
    plt.show()
InΒ [Β ]:
train = iter(trainloader)

i,l = next(train)

imshow(i[0,:,:,:])

imshow(torchvision.utils.make_grid(i))
No description has been provided for this image
No description has been provided for this image

A class to implement the encoder mean and varianceΒΆ

The following class implements the CNN network to obtain both $\mu_\eta(\mathbf{x}), \sigma_\eta(\mathbf{x})$, which determine the moments of the approximate posterior distribution $q(\mathbf{z}|\mathbf{x})$.

We use a single CNN with input $\mathbf{x}$, the output vector of dimension $2k$ is divided in two parts. The first $k$ elements represent the mean, and we compute the variance from the last $k$ elements using a soft-plus.

The actual design of this network is borrowed from this paper. It could be improved by adding batch norm layers, but it does perform well already.

Question: What are the spatial dimensions of the 256 feature maps at the ouput of the conv5 convolutional layer? The spatial dimensions of the tensor at the output of conv5 are: [32, 256, 1, 1]

InΒ [Β ]:
class encoder(nn.Module):

    def __init__(self,dimz,channels=3,var_x=0.1):


        super().__init__()

        self.dimz = dimz    #dimz is k, the dimension of the latent space

        # self.conv1 is a convolutional layer, with 32 output channels, kernel size 4, stride 2,
        # and padding 1

        self.conv1 = nn.Conv2d(in_channels=channels, out_channels=32,
                               kernel_size=4, stride=2, padding=1)

        self.relu = nn.ReLU()

        # self.conv2 is a convolutional layer, with 32 output channels, kernel size 4, stride 2,
        # and padding 1

        self.conv2 = nn.Conv2d(in_channels=32, out_channels=32,
                               kernel_size=4, stride=2, padding=1)

        # self.conv3 is a convolutional layer, with 64 output channels, kernel size 4, stride 2,
        # and padding 1

        self.conv3 = nn.Conv2d(in_channels=32, out_channels=64,
                               kernel_size=4, stride=2, padding=1)

        # self.conv4 is a convolutional layer, with 64 output channels, kernel size 4, stride 2,
        # and padding 1

        self.conv4 = nn.Conv2d(in_channels=64, out_channels=64,
                               kernel_size=4, stride=2, padding=1)

        # self.conv5 is a convolutional layer, with 256 output channels, kernel size 4, stride 1,
        # and padding 0

        self.conv5 = nn.Conv2d(in_channels=64, out_channels=256,
                               kernel_size=4, stride=1, padding=0)


        # self.linear is a linear layer with dimz*2 outputs
        self.linear = nn.Linear(in_features = 256, out_features = self.dimz * 2)

        self.softplus = nn.Softplus()

    def forward(self,x):

        # The forward method to project and image into a 2dimz dimensional vector

        z = self.relu(self.conv1(x))
        z = self.relu(self.conv2(z))
        z = self.relu(self.conv3(z))
        z = self.relu(self.conv4(z))
        z = self.relu(self.conv5(z))
        # Transform z into a 256-dim vector
        z = torch.squeeze(z)
        z = self.linear(z)

        return z

    def encode_and_sample(self,x,flag_sample=True):

        #Β This methods compute both the posterior mean and variance
        # Also obtain a sample from the posterior using the
        # reparameterization trick.

        # We obtain the encoder projection using the forward method

        z = self.forward(x)

        # The mean is the first dimz components of the forward output

        mu = z[:, :self.dimz]

        # compute the variance from the last dimz components using a
        # soft plus
        var = self.softplus(0.5 * z[:, self.dimz:])

        sample = None

        if(flag_sample==True):

            eps = torch.randn_like(var)

            sample = mu + eps*(var**0.5)

        return mu,var,sample

Using dimz=2, create an encoder object and print the mean mu_z and variance var z in the z space for one image.

InΒ [Β ]:
encoder_obj = encoder(dimz = 2)
mu_z, var_z, sample = encoder_obj.encode_and_sample(i)

print("mu_z shape: {}".format(mu_z.detach().numpy().shape))
print("mu_z of image 0:")
print(mu_z[0, :])

print("\nvar_z shape: {}".format(var_z.detach().numpy().shape))
print("var_z of image 0:")
print(var_z[0, :])

print("\nSample shape: {}".format(sample.detach().numpy().shape))
mu_z shape: (32, 2)
mu_z of image 0:
tensor([-0.0174, -0.0021], grad_fn=<SliceBackward0>)

var_z shape: (32, 2)
var_z of image 0:
tensor([0.6916, 0.7035], grad_fn=<SliceBackward0>)

Sample shape: (32, 2)

A class to implement the decoder meanΒΆ

Given a sample from $\mathbf{z}$, we generate images by sample from $p(\mathbf{x}|\mathbf{z}) = \mathcal{N}(\mu_\theta(\mathbf{z}),\sigma_x\mathbf{I})$, where we take $\sigma_x=0.1$ as a reconstruction noise. The following class implements the mean NN $\mu_\theta(\mathbf{z})$ using transpose convolutions. Again, the network design comes from the same this paper.

Why do we use an hiperbolic tangent output activation?

In order to obtain a (-1,1) bound instead of the (0,1) bound, we obtain using another activation function such as Relu or Sigmoid. The reason is that we want to obtain the final color channels values for each pixel and we use our own version of imshow where we divide by $2$ and add $0.5$ to each pixel of the image.

InΒ [Β ]:
class decoder(nn.Module):

    def __init__(self,dimz,channels=3,var_x=0.1):

        super().__init__()

        # We expand z into a 256 dimensional vector

        self.linear = nn.Linear(dimz,256)

        self.relu = nn.ReLU()

        self.tanh = nn.Tanh()

        # self.tconv1 is a convolutional layer, with 64 output channels, kernel size 4, stride 1,
        # and padding 0

        self.tconv1 = nn.ConvTranspose2d(256, 64, kernel_size=4, stride=1,padding=0)

        # self.tconv2 is a convolutional layer, with 64 output channels, kernel size 4, stride 2,
        # and padding 1

        self.tconv2 = nn.ConvTranspose2d(64, 64, kernel_size=4, stride=2,padding=1)

        # self.tconv3 is a convolutional layer, with 32 output channels, kernel size 4, stride 2,
        # and padding 1

        self.tconv3 = nn.ConvTranspose2d(64, 32, kernel_size=4, stride=2,padding=1)

        # self.tconv3 is a convolutional layer, with 32 output channels, kernel size 4, stride 2,
        # and padding 1

        self.tconv4 = nn.ConvTranspose2d(32, 32, kernel_size=4, stride=2,padding=1)

        # self.tconv3 is a convolutional layer, with channels output channels, kernel size 4, stride 2,
        # and padding 1

        self.tconv5 = nn.ConvTranspose2d(32, channels, kernel_size=4, stride=2,padding=1)

    def forward(self,z):

        x = self.relu(self.linear(z).view(-1,256,1,1))
        x = self.relu(self.tconv1(x))
        x = self.relu(self.tconv2(x))
        x = self.relu(self.tconv3(x))
        x = self.relu(self.tconv4(x))
        x = self.tanh(self.tconv5(x))
        return x

    def decode(self,z):

        # This function simply calls the forward method

        return self.forward(z)

Create a decoder object using dimz=2 again. Given the mean projection of an image mu_z obtained before, use the decoder to obtain the mean x_mean of $p(\mathbf{x}|\mathbf{z})$. Represent one original image versus the reconstruction. Obviously, since we have not trained the model yet, don't expect too much.

InΒ [Β ]:
my_dec = decoder(dimz=2)

x_mean = my_dec.decode(mu_z).detach()

imshow(i[0,:,:,:])

imshow(x_mean[0,:,:,:])
No description has been provided for this image
No description has been provided for this image

Observe that, since the mean x_mean is much different from the real image, then the real image is extremely unlikely in the distribution $p(\mathbf{x}|\mathbf{z})$. The model will never generate it by sampling.

For one image x, the following function evaluates the log-likelihood of an independent Gaussian distribution given the mean and the diagonal covariance matrix. In the function, both x and mu_x are passed as images and internally stacked to vectors. var_x is a constant. var_x is a constant vector of 0.1 elements.

InΒ [Β ]:
def eval_Gaussian_LL(x,mu_x,var_x):

    # x is a mini-batch of images. It has dimension [Batch,3,dimx,dimx]

    # mu_x is a mini-batch of reconstructed images. It has dimension [Batch,3,dimx,dimx]

    # var_x is a torch constant

    D = x.shape[1] * x.shape[2] * x.shape[3]   # Dimension of the image

    x = x.reshape(-1, D)

    mu_x = mu_x.reshape(-1, D)

    var_x = torch.ones_like(mu_x) * var_x

    # Constant term in the gaussian distribution
    cnt = D * np.log(2 * np.pi) + torch.sum(torch.log(var_x), dim=-1)

    # log-likelihood per datapoint

    logp_data = -0.5 * (cnt + torch.sum((x - mu_x) * var_x ** -1 * (x - mu_x), dim=-1))

    # Accumulated Gaussian log-likelihood for all datapoints in the batch
    logp = torch.sum(logp_data)

    return logp,logp_data

Get the log-likelihood of one real image given x_mean computed above and var_x=0.1.

InΒ [Β ]:
var_x = 0.1

logp,logp_data = eval_Gaussian_LL(i,x_mean,var_x)

print(logp)

plt.plot(np.arange(0,32),logp_data)
tensor(-653016.)
Out[Β ]:
[<matplotlib.lines.Line2D at 0x7f25f2adbdd0>]
No description has been provided for this image

The variational autoencoder classΒΆ

The following class puts together the VAE encoder & decoder and also defines the ELBO lower bound. Extend it later to incorporate training methods.

InΒ [Β ]:
class VAE(nn.Module):

    def __init__(self,dimz,channels=3,var_x=0.1):

        super().__init__()

        self.var_x = var_x

        self.dimz = dimz

        # We create an encoder network

        self.encoder = encoder(self.dimz)

        # We create a decoder network

        self.decoder = decoder(self.dimz)

    def forward(self,x):

        # In the forward method, we return the mean and variance
        # given by the encoder network and also the reconstruction mean
        # given by the decoder network using a sample from the
        # encoder's posterior distribution.

        mu_z,var_z,sample_z = self.encoder.encode_and_sample(x)

        # Decoder provides the mean of the reconstruction

        mu_x = self.decoder.decode(sample_z)

        return mu_x,mu_z,var_z

    # Reconstruction + KL divergence losses summed over all elements and batch

    def loss_function(self, x, mu_x, mu_z, var_z):

        # We evaluate the loglikelihood in the batch using the function provided above

        logp,_ = eval_Gaussian_LL(x, mu_x, self.var_x)

        # KL divergence between q(z) and N()
        # see Appendix B from VAE paper:
        # Kingma and Welling. Auto-Encoding Variational Bayes. ICLR, 2014
        # https://arxiv.org/abs/1312.6114

        KLz = -0.5 * torch.sum(1 + torch.log(var_z) - mu_z.pow(2) - var_z)

        # To maximize ELBO we minimize loss (-ELBO)
        return -logp + KLz, -logp, KLz

Create a VAE object for dimz=2 and evaluate the ELBO using mu_z, var_z, and x_mean computed above.

InΒ [Β ]:
my_vae = VAE(dimz = 2)

print(my_vae.loss_function(i,x_mean,mu_z,var_z))
(tensor(653017.8750, grad_fn=<AddBackward0>), tensor(653016.), tensor(1.8520, grad_fn=<MulBackward0>))

Incorporating a training methodΒΆ

The following class completes the implementation of the VAE.

  • Since training is pretty slow (every epoch can take a few minutes), save the model every few minibatches. User indicates the saving path using the save_folder argument.
  • I have introduced a restore flag. When set to true, it loads the model parameters from a file saved in the save_folder argument.
  • The class also incorporates a method to sample from the generative model. Namely, create new images. To this end, we sample $\mathbf{z}$ from $p(\mathbf{z}) = \mathcal{N}(0,I)$ and then we return the mean of $p(\mathbf{x}|\mathbf{z})$.
InΒ [Β ]:
class VAE_extended(VAE):

    def __init__(self, dimz=2,  channels=3, var_x=0.1,lr=1e-3,epochs=20,save_folder='./',restore=False):

        super().__init__(dimz,channels=3,var_x=0.1)

        self.lr = lr
        self.optim = optim.Adam(self.parameters(), self.lr)
        self.epochs = epochs

        self.save_folder = save_folder

        if(restore==True):
            state_dict = torch.load(self.save_folder+'VAE_checkpoint.pth')
            self.load_state_dict(state_dict)

        self.loss_during_training = []
        self.reconstruc_during_training = []
        self.KL_during_training = []

        self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
        self.to(self.device)

    def trainloop(self,trainloader):

        nims = len(trainloader.dataset)

        self.train()

        for e in range(int(self.epochs)):

            train_loss = 0
            train_rec = 0
            train_kl_l = 0

            idx_batch = 0

            for images, labels in trainloader:

                images = images.to(self.device)

                self.optim.zero_grad()

                mu_x, mu_z, var_z = self.forward(images)

                loss, rec, kl_l = self.loss_function(images,mu_x, mu_z, var_z)

                loss.backward()

                train_loss += loss.item()
                train_rec += rec.item()
                train_kl_l += kl_l.item()

                self.optim.step()

                if(idx_batch%10==0):

                    torch.save(self.state_dict(), self.save_folder + 'VAE_checkpoint.pth')

                idx_batch += 1

            self.loss_during_training.append(train_loss/len(trainloader))
            self.reconstruc_during_training.append(train_rec/len(trainloader))
            self.KL_during_training.append(train_kl_l/len(trainloader))

            if(e%5==0):

                torch.save(self.state_dict(), self.save_folder + 'VAE_checkpoint.pth')
                print('Train Epoch: {} \tLoss: {:.6f}'.format(e,self.loss_during_training[-1]))


    def sample(self,num_imgs):

        with torch.no_grad():

            eps = torch.randn([num_imgs,self.dimz]).to(self.device)

            x_sample = self.decoder.decode(eps)

            return x_sample.to("cpu").detach()

Validate a pre-trained modelΒΆ

First, we're going to see a VAE model in action. In the file VAE_checkpoint.pth linked in the class Aula Global, we were given the parameters of a VAE with dim_z=50 trained for 200 epochs with var_x=0.1, lr=1e-3.

The goal is now to create a VAE with the parameters contained in the file VAE_checkpoint.pth. To do so, instantiate the class VAE_extended with the restore_flag set to True.

InΒ [Β ]:
path_to_folder = '/content/drive/My Drive/MM + PPCA/'
my_vae = VAE_extended(dimz = 50, var_x= 0.1, lr = 1e-3, restore = True ,save_folder = path_to_folder)

With the method sample from the class VAE_extended, generate 20 images from the probabilistic model. Note these are images created by this model! They correspond to people that do not exist!.

There are many variants of VAEs that improve these results, but even this vanilla VAE implemented here does a pretty god job! Despite the fact that the images contain lots of artifacts, note that the images also contain lots of very realistic details. Also note how the model tries to vary the features of the images (hair, background, smiles etc ...).

InΒ [Β ]:
x = my_vae.sample(20)
InΒ [Β ]:
imshow(x[0,:,:,:])

imshow(torchvision.utils.make_grid(x))
No description has been provided for this image
No description has been provided for this image

Compare a real image versus its reconstruction. Note that the model is by default saved in a GPU if available, so must move the images there. To visualize reconstructed images, move the result of the decoder back to CPU.

InΒ [Β ]:
z_batch,_,_ = my_vae.encoder.encode_and_sample(i.to(my_vae.device))

x_reconstructed = my_vae.decoder.decode(z_batch).detach().to('cpu')

#Β We plot he original image
imshow(i[0,:,:,:])

# And the reconstruction
imshow(x_reconstructed[0,:,:,:])
No description has been provided for this image
No description has been provided for this image

Using the VAE, we can also interpolate between images. Given the latent representation $\mathbf{z}_1$ and $\mathbf{z}_2$ of two images, we can visualize the images that correspond to different linear interpolations between both latent points. To this end, given $w=[0,0.1,0.2, ...,1]$, will visualize the mean of $p(\mathbf{x}|\mathbf{z})$ for

$$z = (1-w)\mathbf{z}_1+w\mathbf{z}_2$$

Interpolate between two images:

InΒ [Β ]:
# visualize the images to interpolate

img_1 = i[0,:,:,:]
img_2 = i[1,:,:,:]

imshow(img_1)

imshow(img_2)
No description has been provided for this image
No description has been provided for this image
InΒ [Β ]:
#Β z_1 and z_2 are saved in z_batch, computed previously

z_1 = z_batch[0,:]
z_2 = z_batch[1,:]

# Linear interpolation

weight = torch.Tensor(1.0-np.arange(0,10,1)/10)

reconstructed_image = [my_vae.decoder.decode(z_1*w+z_2*(1-w)).to('cpu').detach() for w in weight]
InΒ [Β ]:
#Β Visualize all images in the reconstructed_image list

for im in reconstructed_image:
    imshow(im.squeeze())
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image

Using TSNE over `z_batch``, show how real images are projected in a 2D space.

InΒ [Β ]:
from matplotlib.offsetbox import OffsetImage, AnnotationBbox

def rescale(img):
    img = img / 2 + 0.5     # unnormalize to plot
    npimg = img.numpy()
    return np.transpose(npimg, (1, 2, 0))

def plot_latent_space_with_images(images,latent,xmin=-150,xmax=150,ymin=-150,ymax=150):

    # images --> Minibatch of images (numpy array!)
    # latent --> Matrix of 2D representations (numpy array!)

    f, ax = plt.subplots(1,1,figsize=(8, 8))
    # ax is a figure handle
    ax.clear()
    for i in range(len(images)):
        im = OffsetImage(rescale(images[i,:,:,:]))
        ab = AnnotationBbox(im, latent[i,:],frameon=False)
        ax.add_artist(ab)
    #We set the limits according to the maximum and minimum values found for the latent projections
    ax.set_xlim(xmin,xmax)
    ax.set_ylim(ymin,ymax)
    ax.set_title('Latent space Z with Images')
InΒ [Β ]:
from sklearn.manifold import TSNE

# Apply TSNE over z_batch

latent_tsne = TSNE(n_components=2).fit_transform(z_batch.detach().to('cpu'))
/usr/local/lib/python3.7/dist-packages/sklearn/manifold/_t_sne.py:783: FutureWarning: The default initialization in TSNE will change from 'random' to 'pca' in 1.2.
  FutureWarning,
/usr/local/lib/python3.7/dist-packages/sklearn/manifold/_t_sne.py:793: FutureWarning: The default learning rate in TSNE will change from 200.0 to 'auto' in 1.2.
  FutureWarning,
InΒ [Β ]:
plot_latent_space_with_images(i, latent_tsne)
No description has been provided for this image

Train the VAE from scratchΒΆ

Let's see if we can train a VAE from nothing for a few epochs and generate images from the model.

InΒ [Β ]:
# Create the model (default var_x and lr)
my_vae_trained = VAE_extended(dimz = 100, epochs = 40)

# Train the model
my_vae_trained.trainloop(trainloader)
Train Epoch: 0 	Loss: 483821.523888
Train Epoch: 5 	Loss: 128573.141827
Train Epoch: 10 	Loss: 85271.208909
Train Epoch: 15 	Loss: 65216.461050
Train Epoch: 20 	Loss: 51617.274564
Train Epoch: 25 	Loss: 42146.586013
Train Epoch: 30 	Loss: 30253.109544
Train Epoch: 35 	Loss: 19972.311340
InΒ [Β ]:
x = my_vae_trained.sample(20)
imshow(x[0,:,:,:])
imshow(torchvision.utils.make_grid(x))
No description has been provided for this image
No description has been provided for this image

Wow!!ΒΆ