Modules

import pyblaze.nn as xnn

The modules module provides a variety of neural network layers that are not included directly in PyTorch. They are simply implementations of torch.nn.Module.

Basic

Stacked LSTM

class pyblaze.nn.StackedLSTM(input_size, hidden_sizes, bias=True, batch_first=False, cudnn=True)[source]

The stacked LSTM is an extension to PyTorch’s native LSTM allowing stacked LSTMs with different hidden dimensions being stacked. Furthermore, it allows using an LSTM on a GPU without cuDNN. This is useful when higher-order gradients are required. In all other cases, it is best to use PyTorch’s builtin LSTM.

__init__(input_size, hidden_sizes, bias=True, batch_first=False, cudnn=True)[source]

Initializes a new stacked LSTM according to the given parameters.

Parameters
  • input_size (int) – The dimension of the sequence’s elements.

  • hidden_sizes (list of int) – The dimensions of the stacked LSTM’s layers.

  • bias (bool, default: True) – Whether to use biases in the LSTM.

  • batch_first (bool, default: False) – Whether the batch or the sequence can be found in the first dimension.

  • cudnn (bool, default: True) – Whether to use PyTorch’s LSTM implementation which uses cuDNN on Nvidia GPUs. You usually don’t want to change the default value, however, PyTorch’s default implementation does not allow higher-order gradients of the LSTMCell as of version 1.1.0. When this value is set to False, we therefore use a (slower) implementation of a LSTM cell which allows higher-order gradients.

forward(inputs: torch.Tensor, initial_states: Optional[List[Tuple[torch.Tensor, torch.Tensor]]] = None, return_sequence: bool = True)[source]

Computes the forward pass through the stacked LSTM.

Parameters
  • inputs (torch.Tensor [S, B, N]) – The inputs fed to the LSTM one after the other. Sequence length S, batch size B, and input size N. If batch_first is set to True, the first and second dimension should be swapped.

  • initial_states (list of tuple of (torch.Tensor [H_i], torch.Tensor [H_i]), default: None) – The initial states for all LSTM layers. The length of the list must match the number of layers in the LSTM, the sizes of the states must match the hidden sizes of the LSTM layers. If None is given, the initial states are defaulted to all zeros.

  • return_sequence (bool, default: True) – Whether to return all outputs from the last LSTM layer or only the last one.

Returns

Depending on whether sequences are returned, either all outputs or only the output from the last cell are returned. If the stacked LSTM was initialized with batch_first, the first and second dimension are swapped when sequences are returned.

Return type

torch.Tensor [S, B, K] or torch.Tensor [B, K]

Stacked LSTM Cell

class pyblaze.nn.StackedLSTMCell(input_size, hidden_sizes, bias=True, cudnn=True)[source]

Actually, the StackedLSTMCell can easily be constructed from existing modules, however, a bug in PyTorch’s JIT compiler prevents implementing anything where a stacked LSTM is used within a loop (see the following issue: https://github.com/pytorch/pytorch/issues/18143). Hence, this class provides a single time step for a stacked LSTM.

__init__(input_size, hidden_sizes, bias=True, cudnn=True)[source]

Initializes a new stacked LSTM cell.

Parameters
  • input_size (int) – The dimension of the input variables.

  • hidden_sizes (list of int) – The hidden dimension of the stacked LSTMs.

  • bias (bool, default: True) – Whether to use a bias term for the LSTM implementation.

  • cudnn (bool, default: True) – Whether to not use cuDNN. In almost all cases, you don’t want to set this value to false, however, you will need to change it if you want to compute higher-order derivatives of a network with a stacked LSTM cell.

forward(x: torch.Tensor, initial_states: Optional[List[Tuple[torch.Tensor, torch.Tensor]]] = None)[source]

Computes the new hidden states and cell states for each stacked cell.

Parameters
  • x (torch.Tensor [B, N]) – The input with batch size B and dimension N.

  • states (list of tuple of (torch.Tensor [B, D], torch.Tensor [B, D]), default: None) – The states for each of the cells where each state is expected to have a size with batch size B and (respective) hidden dimension D.

Returns

  • torch.Tensor [B, D] – The output, i.e. the hidden state of the deepest cell. Only given for convenience as it can be extracted from the other return value.

  • list of tuple of (torch.Tensor [B, D], torch.Tensor [B, D]) – The new hidden states and cell states for all cells.

Linear Residual

class pyblaze.nn.LinearResidual(dim, hidden_dim, activation=ReLU(), bias=True)[source]

Residual module that models a two-layer MLP with nonlinearity and adds the input to the output:

\[f(x) = x + W_2 \sigma(W_1 x + b_1) + b_2\]

Usually, another nonlineary is applied to the output.

__init__(dim, hidden_dim, activation=ReLU(), bias=True)[source]

Initializes a new residual module.

Parameters
  • dim (int) – The dimension of the input. Equals the dimension of the output.

  • hidden_dim (int) – The hidden dimension (i.e. the output dimension of \(W_1\)).

  • activation (torch.nn.Module, default: torch.nn.ReLU()) – An activation function to use (i.e. \(\sigma\) in the formula above).

  • bias (bool, default: True) – Whether to add biases to the linear layers (i.e. \(b_{12}\) in the formula above).

forward(x)[source]

Computes the output of the residual module.

Parameters

x (torch.Tensor [N, D]) – The input (batch size N, dimensionality D).

Returns

The processed output.

Return type

torch.Tensor [N, D]

View

class pyblaze.nn.View(*dim)[source]

Utility module that views the input as a new dimension. This module is usually used when making use of torch.nn.Sequential and requiring reshaping a linear output layer to a 2D input or the like.

__init__(*dim)[source]

Initializes a new view module.

Parameters

dim (varargs of int) – The new dimension. May contain no more than one -1.

forward(x)[source]

Views the input as this module’s view dimension.

Parameters

x (torch.Tensor) – The tensor to view differently.

Returns

The input tensor with a new view on it.

Return type

torch.Tensor

Variational Autoencoder

Loss

class pyblaze.nn.VAELoss(loss)[source]

Loss for the reconstruction error of a variational autoencoder when the encoder parametrizes a Gaussian distribution. Taken from “Auto-Encoding Variational Bayes” (Kingma and Welling, 2014).

__init__(loss)[source]

Initializes a new loss for a variational autoencoder.

Parameters

loss (torch.nn.Module) – The loss to incur for the decoder’s output given (x_pred, x_true). This might e.g. be a BCE loss. The reduction must be ‘none’.

forward(x_pred, mu, logvar, x_true)[source]

Computes the loss of the decoder’s output.

Parameters
  • x_pred (torch.Tensor [N, ..]) – The outputs of the decoder (batch size N).

  • mu (torch.Tensor [N, D]) – The output for the means from the encoder (dimensionality D).

  • logvar (torch.Tensor [N, D]) – The output for the log-values of the diagonal entries of the covariance matrix.

  • x_true (torch.Tensor [N, ..]) – The target outputs for the decoder.

Returns

The loss incurred computed as the actual loss plus a weighted KL-divergence.

Return type

torch.Tensor [1]

Wasserstein GANs

Generator Loss

class pyblaze.nn.WassersteinLossGenerator[source]

Computes the loss of the generator in the Wasserstein GAN setting.

forward(out)[source]

Computes the loss for the generator given the outputs of the critic.

Parameters

out (torch.Tensor [N]) – The output values of the critic (batch size N).

Returns

The loss incurred for the generator.

Return type

torch.Tensor [1]

Critic Loss

class pyblaze.nn.WassersteinLossCritic(gradient_penalty=None)[source]

Computes the loss of the critic in the Wasserstein GAN setting. This loss optionally includes a gradient penalty that should be used if no other regularization methods (weight clipping, spectral normalization, …) are used.

__init__(gradient_penalty=None)[source]

Initializes a new Wasserstein loss for a critic.

Parameters

gradient_penalty (nn.Module, default: False) – A gradient penalty object that accepts fake and real inputs to the critic and computes the gradient penalty for it.

forward(out_fake, out_real, *inputs)[source]

Computes the loss for the critic given the outputs of itself and potentially a tuple of inputs.

Parameters
  • out_fake (torch.Tensor [N]) – The critic’s output for the fake inputs (batch size N).

  • out_real (torch.Tensor [N]) – The critic’s output for the real inputs.

  • inputs (tuple of (torch.Tensor [N, ..], torch.Tensor [N, ..])) – A tuple of (in_fake, in_real) that must be given if a gradient penalty is used.

Returns

  • torch.Tensor [1] – The loss incurred for the critic.

  • torch.Tensor [1] – The estimated Earth mover’s (Wasserstein-1) distance (equal to the detached negative loss if there is no gradient penalty).

Gradient Penalty

class pyblaze.nn.GradientPenalty(module, coefficient=10, lipschitz=False)[source]

Implementation of the gradient penalty as presented in “Improved Training of Wasserstein GANs” (Gulrajani et al., 2017). It ensures that the norm of the critic’s gradient is close to 1, ensuring Lipschitz continuity.

Optionally, the gradient penalty can be replaced by a Lipschitz penalty which does not penalize gradients smaller than one. It is taken from “On the Regularization of Wasserstein GANs” (Petzka et al., 2018).

__init__(module, coefficient=10, lipschitz=False)[source]

Initializes a new gradient penalty for the given module.

Parameters
  • module (torch.nn.Module) – The module whose gradient norm should be penalized.

  • coefficient (float, default: 10) – The coefficient for the gradient penalty. The default value is taken from the original WGAN-GP paper.

  • lipschitz (boolean, default: False) – Whether to use Lipschitz penalty instead of simple gradient penalty (not penalizing gradient norms smaller than 1).

forward(fake, real)[source]

Computes the loss incurred on the penalized module based on a batch of fake and real instances.

Parameters
  • fake (torch.Tensor [N, ..]) – The fake instances (batch size N).

  • real (torch.Tensor [N, ..]) – The real instances.

Returns

The gradient penalty times the penalty coefficient.

Return type

torch.Tensor [1]

interpolate(fake, real)[source]

Interpolates the given fake and real instances with an arbitrary alpha value weighing each batch sample. By default, it assumes that fake and real instances can be interpolated over the first dimension. This method may be overridden by subclasses for more complicated models.

Parameters
  • fake (torch.Tensor [N, ..]) – The fake instances passed to the module (batch size N).

  • real (torch.Tensor [N, ..]) – The real instances passed to the module.

Returns

  • torch.Tensor [N, …] – The interpolation which (which must have requires_grad set to True).

  • torch.Tensor [N] – The module’s output for the interpolated fake and real instances.

Density Estimation

Masked Autoencoder

class pyblaze.nn.MADE(*dims, activation=LeakyReLU(negative_slope=0.01))[source]

Masked autoencoder for distribution estimation (MADE) as introduced in MADE: Masked Autoencoder for Distribution Estimation (Germain et al., 2015). In consists of a series of masked linear layers and a given non-linearity between them.

__init__(*dims, activation=LeakyReLU(negative_slope=0.01))[source]

Initializes a new MADE model as a sequence of masked linear layers.

Parameters
  • dims (varargs of int) – Dimensions of input (first), output (last) and hidden layers. At least one hidden layer must be defined, i.e. at least 3 dimensions must be given. The output dimension must be equal to the input dimension or a multiple of it. Hidden dimensions should be a multiple of the input dimension unless a seed for random initialization is given.

  • activation (torch.nn.Module, default: torch.nn.LeakyReLU()) – An activation function to be used after linear layers (except for the output layer). This module is shared for all hidden layers.

forward(x)[source]

Computes the outputs of the MADE model.

Parameters

x (torch.Tensor [.., D]) – The input (input dimension D).

Returns

The output (output dimension E).

Return type

torch.Tensor [.., E]

Normal Loss

class pyblaze.nn.TransformedNormalLoss(reduction='mean')[source]

This loss returns the negative log-likelihood (NLL) of some data that has been transformed via invertible transformations. The NLL is computed via the negative sum of the log-determinant of the transformations and the log-probability of observing the output under a standard Normal distribution. This loss is typically used to fit a normalizing flow.

__init__(reduction='mean')[source]

Initializes a new NLL loss.

Parameters

reduction (str, default: 'mean') – The kind of reduction to perform. Must be one of [‘mean’, ‘sum’, ‘none’].

forward(z, log_det)[source]

Computes the NLL for the given transformed values.

Parameters
  • z (torch.Tensor [N, D]) – The output values of the transformations (batch size N, dimensionality D).

  • log_det (torch.Tensor [N]) – The log-determinants of the transformations for all values.

Returns

The mean NLL for all given values.

Return type

torch.Tensor [1]

GMM Loss

class pyblaze.nn.TransformedGmmLoss(means, trainable=False, reduction='mean')[source]

This loss returns the negative log-likelihood (NLL) of some data that has been transformed via invertible transformations. The NLL is computed via the negative sum of the log-determinant of the transformations and the log-probability of observing the output under a GMM with predefined means and unit variances. The simple alternative to this loss is the TransformedNormalLoss.

__init__(means, trainable=False, reduction='mean')[source]

Initializes a new GMM loss.

Parameters
  • means (torch.Tensor [N, D]) – The means of the GMM. For random initialization of the means, consider using pyblaze.nn.functional.random_gmm().

  • trainable (bool, default: False) – Whether the means are trainable.

  • reduction (str, default: 'mean') – The kind of reduction to perform. Must be one of [‘mean’, ‘sum’, ‘none’].

forward(z, log_det)[source]

Computes the NLL for the given transformed values.

Parameters
  • z (torch.Tensor [N, D]) – The output values of the transformations (batch size N, dimensionality D).

  • log_det (torch.Tensor [N]) – The log-determinants of the transformations for all values.

Returns

The mean NLL for all given values.

Return type

torch.Tensor [1]

Normalizing Flows

class pyblaze.nn.NormalizingFlow(transforms)[source]

In general, a normalizing flow is a module to transform an initial density into another one (usually a more complex one) via a sequence of invertible transformations.

__init__(transforms)[source]

Initializes a new normalizing flow applying the given transformations.

Parameters

transforms (list of torch.nn.Module) – Transformations whose forward method yields the transformed value and the log- determinant of the applied transformation. All transformations must have the same dimension.

forward(z, condition=None)[source]

Computes the outputs and log-detemrinants for the given samples after applying this flow’s transformations.

Parameters
  • z (torch.Tensor [N, D]) – The input value (batch size N, dimensionality D).

  • condition (torch.Tensor [N, C]) – An additional condition vector on which the transforms are conditioned. Causes failure if any of the underlying transforms does not support conditioning.

Returns

  • torch.Tensor [N, D] – The transformed values.

  • torch.Tensor [N] – The log-determinants of the transformation for all values.

Affine Transform

class pyblaze.nn.AffineTransform(dim)[source]

An affine transformation may be used to transform an input variable linearly. It computes the following function for \(\mathbf{z} \in \mathbb{R}^D\):

\[f_{\mathbf{a}, \mathbf{b}}(\mathbf{z}) = \mathbf{a} \odot \mathbf{z} + \mathbf{b}\]

with \(\mathbf{a} \in \mathbb{R}^D_+\) and \(\mathbf{b} \in \mathbb{R}^D\).

The log-determinant of its Jacobian is given as follows:

\[\sum_{k=1}^D{\log{a_k}}\]

Although this transformation is theoretically invertible, the inverse function is not implemented at the moment.

__init__(dim)[source]

Initializes a new affine transformation.

Parameters

dim (int) – The dimension of the inputs to the function.

reset_parameters()[source]

Resets this module’s parameters. All parameters are sampled uniformly from [0, 1].

forward(z)[source]

Transforms the given input.

Parameters

z (torch.Tensor [N, D]) – The given input (batch size N, transform dimensionality D).

Returns

  • torch.Tensor [N, D] – The transformed input.

  • torch.Tensor [N] – The log-determinants of the Jacobian evaluated at the input.

Planar Transform

class pyblaze.nn.PlanarTransform(dim)[source]

A planar transformation may be used to split the input along a hyperplane. It was introduced in “Variational Inference with Normalizing Flows” (Rezende and Mohamed, 2015). It computes the following function for \(\mathbf{z} \in \mathbb{R}^D\) (although the planar transform was introduced for an arbitrary activation function \(\sigma\), this transform restricts the usage to \(tanh\)):

\[f_{\mathbf{u}, \mathbf{w}, b}(\mathbf{z}) = \mathbf{z} + \mathbf{u} \tanh(\mathbf{w}^T \mathbf{z} + b)\]

with \(\mathbf{u}, \mathbf{w} \in \mathbb{R}^D\) and \(b \in \mathbb{R}\).

The log-determinant of its Jacobian is given as follows:

\[\log\left| 1 + \mathbf{u}^T ((1 - \tanh^2(\mathbf{w}^T \mathbf{z} + b))\mathbf{w}) \right|\]

This transform is invertible for its outputs.

__init__(dim)[source]

Initializes a new planar transformation.

Parameters

dim (int) – The dimension of the inputs to the function.

reset_parameters()[source]

Resets this module’s parameters. All parameters are sampled uniformly from [0, 1].

forward(z)[source]

Transforms the given input.

Parameters

z (torch.Tensor [N, D]) – The given input (batch size N, transform dimensionality D).

Returns

  • torch.Tensor [N, D] – The transformed input.

  • torch.Tensor [N] – The log-determinants of the Jacobian evaluated at z.

Radial Transform

class pyblaze.nn.RadialTransform(dim)[source]

A radial transformation may be used to apply radial contractions and expansions around a reference point. It was introduced in “Variational Inference with Normalizing Flows” (Rezende and Mohamed, 2015). It computes the following function for \(\mathbf{z} \in \mathbb{R}^D\):

\[f_{\mathbf{z}_0, \alpha, \beta}(\mathbf{z}) = \mathbf{z} + \beta h(\alpha, r) (\mathbf{z} - \mathbf{z}_0)\]

with \(\mathbf{z}_0 \in \mathbb{R}^D\), \(\alpha \in \mathbb{R}^+\), \(\beta \in \mathbb{R}\), \(\mathbf{r} = ||\mathbf{z} - \mathbf{z}_0||_2\) and \(h(\alpha, r) = (\alpha + r)^{-1}\).

The log-determinant of its Jacobian is given as follows:

\[(D - 1) \log\left(1 + \beta h(\alpha, r)\right) + \log\left(1 + \beta h(\alpha, r) - \beta h^2(\alpha, r) r \right)\]

This transform is invertible for its outputs, however, there does not exist a closed-form solution for computing the inverse in general.

__init__(dim)[source]

Initializes a new planar transformation.

Parameters
  • dim (int) – The dimension of the inputs to the function.

  • activation (torch.nn.Module, default: torch.nn.Tanh()) – The activation function to use. By default, \(\tanh\) is used.

reset_parameters()[source]

Resets this module’s parameters. All parameters are sampled from a standard Normal distribution.

forward(z)[source]

Transforms the given input.

Parameters

z (torch.Tensor [N, D]) – The given input (batch size N, transform dimensionality D).

Returns

  • torch.Tensor [N, D] – The transformed input.

  • torch.Tensor [N] – The log-determinants of the Jacobian evaluated at z.

Affine Coupling Transform 1D

class pyblaze.nn.AffineCouplingTransform1d(dim, fixed_dim, net, constrain_scale=False)[source]

An affine coupling transforms the input by splitting it into two parts and transforming the second part by an arbitrary function depending on the first part. It was introduced in “Density Estimation Using Real NVP” (Dinh et. al, 2017). It computes the following function for \(\mathbf{z} \in \mathbb{R}^D\) and a dimension \(d < D\):

\[f_{\mathbf{\omega}_s, \mathbf{\omega}_m}(\mathbf{z}) = [\mathbf{z}_{1:d}, \mathbf{z}_{d+1:D} \odot \exp(g_{\mathbf{\omega}_s}(\mathbf{z}_{1:d})) + h_{\mathbf{\omega}_m}(\mathbf{z}_{1:d})]^T\]

with \(g, h: \mathbb{R}^d \rightarrow \mathbb{R}^{D-d}\) being arbitrary parametrized functions (e.g. neural networks) computing the log-scale and the translation, respectively.

The log-determinant of its Jacobian is given as follows:

\[\sum_{k=1}^{D-d}{g_{\mathbf{\omega}_s}(\mathbf{z}_{1:d})}\]

Additionally, this transform can be easily conditioned on another input variable \(\mathbf{x}\) by conditioning the functions \(g, h\) on it. This transform is invertible and the inverse computation will be added in the future.

Note

As only part of the input is transformed, consider using this class with the reverse flag set alternately.

__init__(dim, fixed_dim, net, constrain_scale=False)[source]

Initializes a new affine coupling transformation.

Parameters
  • dim (int) – The dimensionality of the input.

  • fixed_dim (int) – The dimensionality of the input space that is not transformed. Must be smaller than the dimension.

  • net (torch.nn.Module [N, F] -> [N, F*2]) – An arbitrary neural network taking as input the fixed part of the input and outputting a mean and a log scale used for scaling and translating the affine part of the input, respectively, as a single tensor which will be split. In case this affine coupling is used with conditioning, the net’s input dimension should be modified accordingly (batch size N, fixed dimension F).

  • constrain_scale (bool, default: False) – Whether to constrain the scale parameter that the output is multiplied by. This should be set for deep normalizing flows where no batch normalization is used.

forward(z, condition=None)[source]

Transforms the given input.

Parameters
  • z (torch.Tensor [N, D]) – The given input (batch size N, dimensionality D).

  • condition (torch.Tensor [N, C]) – An optional tensor on which this layer’s net is conditioned. This value will be concatenated with the part of z that is passed to this layer’s net (condition dimension C).

Returns

  • torch.Tensor [N, D] – The transformed input.

  • torch.Tensor [N] – The log-determinants of the Jacobian evaluated at z.

Masked Autoregressive Transform 1D

class pyblaze.nn.MaskedAutoregressiveTransform1d(dim, *hidden_dims, activation=LeakyReLU(negative_slope=0.01), constrain_scale=False)[source]

1-dimensional Masked Autogressive Transform as introduced in Masked Autoregressive Flow for Density Estimation (Papamakarios et al., 2018).

__init__(dim, *hidden_dims, activation=LeakyReLU(negative_slope=0.01), constrain_scale=False)[source]

Initializes a new MAF transform that is backed by a pyblaze.nn.MADE model.

Parameters
  • dim (int) – The dimension of the inputs.

  • hidden_dims (varargs of int) – The hidden dimensions of the MADE model.

  • activation (torch.nn.Module, default: torch.nn.LeakyReLU()) – The activation function to use in the MADE model.

  • constrain_scale (bool, default: False) – Whether to constrain the scale parameter that the output is multiplied by. This should be set for deep normalizing flows where no batch normalization is used.

forward(x)[source]

Transforms the given input.

Parameters

z (torch.Tensor [N, D]) – The given input (batch size N, dimensionality D).

Returns

  • torch.Tensor [N, D] – The transformed input.

  • torch.Tensor [N] – The log-determinants of the Jacobian evaluated at z.

BatchNorm Transform 1D

class pyblaze.nn.BatchNormTransform1d(dim, eps=1e-05, momentum=0.1)[source]

1-dimensional Batch Normalization layer for stabilizing deep normalizing flows. It was first introduced in Density Estimation Using Real NVP (Dinh et al., 2017).

__init__(dim, eps=1e-05, momentum=0.1)[source]

Initializes a new batch normalization layer for one-dimensional vectors of the given dimension.

Parameters
  • dim (int) – The dimension of the inputs.

  • eps (float, default: 1e-5) – A small value added in the denominator for numerical stability.

  • momentum (float, default: 0.1) – Value used for calculating running average statistics.

reset_parameters()[source]

Resets this module’s parameters.

forward(z)[source]

Transforms the given input.

Note

During testing, inputs that highly differ from the inputs seen during testing, this module is generally prone to outputting non-finite float values. In that case, these inputs are considered to be “impossible” to observe: the transformed output is set to all zeros and the log-determinant is set to -inf.

Parameters

z (torch.Tensor [N, D]) – The given input (batch size N, dimensionality D).

Returns

  • torch.Tensor [N, D] – The transformed input.

  • torch.Tensor [N] – The log-determinants of the Jacobian evaluated at z.