Modules¶
import pyblaze.nn as xnn
The modules module provides a variety of neural network layers that are not included directly in
PyTorch. They are simply implementations of torch.nn.Module
.
Basic¶
Stacked LSTM¶
-
class
pyblaze.nn.
StackedLSTM
(input_size, hidden_sizes, bias=True, batch_first=False, cudnn=True)[source]¶ The stacked LSTM is an extension to PyTorch’s native LSTM allowing stacked LSTMs with different hidden dimensions being stacked. Furthermore, it allows using an LSTM on a GPU without cuDNN. This is useful when higher-order gradients are required. In all other cases, it is best to use PyTorch’s builtin LSTM.
-
__init__
(input_size, hidden_sizes, bias=True, batch_first=False, cudnn=True)[source]¶ Initializes a new stacked LSTM according to the given parameters.
- Parameters
input_size (int) – The dimension of the sequence’s elements.
hidden_sizes (list of int) – The dimensions of the stacked LSTM’s layers.
bias (bool, default: True) – Whether to use biases in the LSTM.
batch_first (bool, default: False) – Whether the batch or the sequence can be found in the first dimension.
cudnn (bool, default: True) – Whether to use PyTorch’s LSTM implementation which uses cuDNN on Nvidia GPUs. You usually don’t want to change the default value, however, PyTorch’s default implementation does not allow higher-order gradients of the LSTMCell as of version 1.1.0. When this value is set to False, we therefore use a (slower) implementation of a LSTM cell which allows higher-order gradients.
-
forward
(inputs: torch.Tensor, initial_states: Optional[List[Tuple[torch.Tensor, torch.Tensor]]] = None, return_sequence: bool = True)[source]¶ Computes the forward pass through the stacked LSTM.
- Parameters
inputs (torch.Tensor [S, B, N]) – The inputs fed to the LSTM one after the other. Sequence length S, batch size B, and input size N. If batch_first is set to True, the first and second dimension should be swapped.
initial_states (list of tuple of (torch.Tensor [H_i], torch.Tensor [H_i]), default: None) – The initial states for all LSTM layers. The length of the list must match the number of layers in the LSTM, the sizes of the states must match the hidden sizes of the LSTM layers. If None is given, the initial states are defaulted to all zeros.
return_sequence (bool, default: True) – Whether to return all outputs from the last LSTM layer or only the last one.
- Returns
Depending on whether sequences are returned, either all outputs or only the output from the last cell are returned. If the stacked LSTM was initialized with batch_first, the first and second dimension are swapped when sequences are returned.
- Return type
torch.Tensor [S, B, K] or torch.Tensor [B, K]
-
Stacked LSTM Cell¶
-
class
pyblaze.nn.
StackedLSTMCell
(input_size, hidden_sizes, bias=True, cudnn=True)[source]¶ Actually, the StackedLSTMCell can easily be constructed from existing modules, however, a bug in PyTorch’s JIT compiler prevents implementing anything where a stacked LSTM is used within a loop (see the following issue: https://github.com/pytorch/pytorch/issues/18143). Hence, this class provides a single time step for a stacked LSTM.
-
__init__
(input_size, hidden_sizes, bias=True, cudnn=True)[source]¶ Initializes a new stacked LSTM cell.
- Parameters
input_size (int) – The dimension of the input variables.
hidden_sizes (list of int) – The hidden dimension of the stacked LSTMs.
bias (bool, default: True) – Whether to use a bias term for the LSTM implementation.
cudnn (bool, default: True) – Whether to not use cuDNN. In almost all cases, you don’t want to set this value to false, however, you will need to change it if you want to compute higher-order derivatives of a network with a stacked LSTM cell.
-
forward
(x: torch.Tensor, initial_states: Optional[List[Tuple[torch.Tensor, torch.Tensor]]] = None)[source]¶ Computes the new hidden states and cell states for each stacked cell.
- Parameters
x (torch.Tensor [B, N]) – The input with batch size B and dimension N.
states (list of tuple of (torch.Tensor [B, D], torch.Tensor [B, D]), default: None) – The states for each of the cells where each state is expected to have a size with batch size B and (respective) hidden dimension D.
- Returns
torch.Tensor [B, D] – The output, i.e. the hidden state of the deepest cell. Only given for convenience as it can be extracted from the other return value.
list of tuple of (torch.Tensor [B, D], torch.Tensor [B, D]) – The new hidden states and cell states for all cells.
-
Linear Residual¶
-
class
pyblaze.nn.
LinearResidual
(dim, hidden_dim, activation=ReLU(), bias=True)[source]¶ Residual module that models a two-layer MLP with nonlinearity and adds the input to the output:
\[f(x) = x + W_2 \sigma(W_1 x + b_1) + b_2\]Usually, another nonlineary is applied to the output.
-
__init__
(dim, hidden_dim, activation=ReLU(), bias=True)[source]¶ Initializes a new residual module.
- Parameters
dim (int) – The dimension of the input. Equals the dimension of the output.
hidden_dim (int) – The hidden dimension (i.e. the output dimension of \(W_1\)).
activation (torch.nn.Module, default: torch.nn.ReLU()) – An activation function to use (i.e. \(\sigma\) in the formula above).
bias (bool, default: True) – Whether to add biases to the linear layers (i.e. \(b_{12}\) in the formula above).
-
View¶
-
class
pyblaze.nn.
View
(*dim)[source]¶ Utility module that views the input as a new dimension. This module is usually used when making use of
torch.nn.Sequential
and requiring reshaping a linear output layer to a 2D input or the like.
Variational Autoencoder¶
Loss¶
-
class
pyblaze.nn.
VAELoss
(loss)[source]¶ Loss for the reconstruction error of a variational autoencoder when the encoder parametrizes a Gaussian distribution. Taken from “Auto-Encoding Variational Bayes” (Kingma and Welling, 2014).
-
__init__
(loss)[source]¶ Initializes a new loss for a variational autoencoder.
- Parameters
loss (torch.nn.Module) – The loss to incur for the decoder’s output given (x_pred, x_true). This might e.g. be a BCE loss. The reduction must be ‘none’.
-
forward
(x_pred, mu, logvar, x_true)[source]¶ Computes the loss of the decoder’s output.
- Parameters
x_pred (torch.Tensor [N, ..]) – The outputs of the decoder (batch size N).
mu (torch.Tensor [N, D]) – The output for the means from the encoder (dimensionality D).
logvar (torch.Tensor [N, D]) – The output for the log-values of the diagonal entries of the covariance matrix.
x_true (torch.Tensor [N, ..]) – The target outputs for the decoder.
- Returns
The loss incurred computed as the actual loss plus a weighted KL-divergence.
- Return type
torch.Tensor [1]
-
Wasserstein GANs¶
Generator Loss¶
Critic Loss¶
-
class
pyblaze.nn.
WassersteinLossCritic
(gradient_penalty=None)[source]¶ Computes the loss of the critic in the Wasserstein GAN setting. This loss optionally includes a gradient penalty that should be used if no other regularization methods (weight clipping, spectral normalization, …) are used.
-
__init__
(gradient_penalty=None)[source]¶ Initializes a new Wasserstein loss for a critic.
- Parameters
gradient_penalty (nn.Module, default: False) – A gradient penalty object that accepts fake and real inputs to the critic and computes the gradient penalty for it.
-
forward
(out_fake, out_real, *inputs)[source]¶ Computes the loss for the critic given the outputs of itself and potentially a tuple of inputs.
- Parameters
out_fake (torch.Tensor [N]) – The critic’s output for the fake inputs (batch size N).
out_real (torch.Tensor [N]) – The critic’s output for the real inputs.
inputs (tuple of (torch.Tensor [N, ..], torch.Tensor [N, ..])) – A tuple of (in_fake, in_real) that must be given if a gradient penalty is used.
- Returns
torch.Tensor [1] – The loss incurred for the critic.
torch.Tensor [1] – The estimated Earth mover’s (Wasserstein-1) distance (equal to the detached negative loss if there is no gradient penalty).
-
Gradient Penalty¶
-
class
pyblaze.nn.
GradientPenalty
(module, coefficient=10, lipschitz=False)[source]¶ Implementation of the gradient penalty as presented in “Improved Training of Wasserstein GANs” (Gulrajani et al., 2017). It ensures that the norm of the critic’s gradient is close to 1, ensuring Lipschitz continuity.
Optionally, the gradient penalty can be replaced by a Lipschitz penalty which does not penalize gradients smaller than one. It is taken from “On the Regularization of Wasserstein GANs” (Petzka et al., 2018).
-
__init__
(module, coefficient=10, lipschitz=False)[source]¶ Initializes a new gradient penalty for the given module.
- Parameters
module (torch.nn.Module) – The module whose gradient norm should be penalized.
coefficient (float, default: 10) – The coefficient for the gradient penalty. The default value is taken from the original WGAN-GP paper.
lipschitz (boolean, default: False) – Whether to use Lipschitz penalty instead of simple gradient penalty (not penalizing gradient norms smaller than 1).
-
forward
(fake, real)[source]¶ Computes the loss incurred on the penalized module based on a batch of fake and real instances.
- Parameters
fake (torch.Tensor [N, ..]) – The fake instances (batch size N).
real (torch.Tensor [N, ..]) – The real instances.
- Returns
The gradient penalty times the penalty coefficient.
- Return type
torch.Tensor [1]
-
interpolate
(fake, real)[source]¶ Interpolates the given fake and real instances with an arbitrary alpha value weighing each batch sample. By default, it assumes that fake and real instances can be interpolated over the first dimension. This method may be overridden by subclasses for more complicated models.
- Parameters
fake (torch.Tensor [N, ..]) – The fake instances passed to the module (batch size N).
real (torch.Tensor [N, ..]) – The real instances passed to the module.
- Returns
torch.Tensor [N, …] – The interpolation which (which must have requires_grad set to True).
torch.Tensor [N] – The module’s output for the interpolated fake and real instances.
-
Density Estimation¶
Masked Autoencoder¶
-
class
pyblaze.nn.
MADE
(*dims, activation=LeakyReLU(negative_slope=0.01))[source]¶ Masked autoencoder for distribution estimation (MADE) as introduced in MADE: Masked Autoencoder for Distribution Estimation (Germain et al., 2015). In consists of a series of masked linear layers and a given non-linearity between them.
-
__init__
(*dims, activation=LeakyReLU(negative_slope=0.01))[source]¶ Initializes a new MADE model as a sequence of masked linear layers.
- Parameters
dims (varargs of int) – Dimensions of input (first), output (last) and hidden layers. At least one hidden layer must be defined, i.e. at least 3 dimensions must be given. The output dimension must be equal to the input dimension or a multiple of it. Hidden dimensions should be a multiple of the input dimension unless a seed for random initialization is given.
activation (torch.nn.Module, default: torch.nn.LeakyReLU()) – An activation function to be used after linear layers (except for the output layer). This module is shared for all hidden layers.
-
Normal Loss¶
-
class
pyblaze.nn.
TransformedNormalLoss
(reduction='mean')[source]¶ This loss returns the negative log-likelihood (NLL) of some data that has been transformed via invertible transformations. The NLL is computed via the negative sum of the log-determinant of the transformations and the log-probability of observing the output under a standard Normal distribution. This loss is typically used to fit a normalizing flow.
-
__init__
(reduction='mean')[source]¶ Initializes a new NLL loss.
- Parameters
reduction (str, default: 'mean') – The kind of reduction to perform. Must be one of [‘mean’, ‘sum’, ‘none’].
-
forward
(z, log_det)[source]¶ Computes the NLL for the given transformed values.
- Parameters
z (torch.Tensor [N, D]) – The output values of the transformations (batch size N, dimensionality D).
log_det (torch.Tensor [N]) – The log-determinants of the transformations for all values.
- Returns
The mean NLL for all given values.
- Return type
torch.Tensor [1]
-
GMM Loss¶
-
class
pyblaze.nn.
TransformedGmmLoss
(means, trainable=False, reduction='mean')[source]¶ This loss returns the negative log-likelihood (NLL) of some data that has been transformed via invertible transformations. The NLL is computed via the negative sum of the log-determinant of the transformations and the log-probability of observing the output under a GMM with predefined means and unit variances. The simple alternative to this loss is the
TransformedNormalLoss
.-
__init__
(means, trainable=False, reduction='mean')[source]¶ Initializes a new GMM loss.
- Parameters
means (torch.Tensor [N, D]) – The means of the GMM. For random initialization of the means, consider using
pyblaze.nn.functional.random_gmm()
.trainable (bool, default: False) – Whether the means are trainable.
reduction (str, default: 'mean') – The kind of reduction to perform. Must be one of [‘mean’, ‘sum’, ‘none’].
-
forward
(z, log_det)[source]¶ Computes the NLL for the given transformed values.
- Parameters
z (torch.Tensor [N, D]) – The output values of the transformations (batch size N, dimensionality D).
log_det (torch.Tensor [N]) – The log-determinants of the transformations for all values.
- Returns
The mean NLL for all given values.
- Return type
torch.Tensor [1]
-
Normalizing Flows¶
-
class
pyblaze.nn.
NormalizingFlow
(transforms)[source]¶ In general, a normalizing flow is a module to transform an initial density into another one (usually a more complex one) via a sequence of invertible transformations.
-
__init__
(transforms)[source]¶ Initializes a new normalizing flow applying the given transformations.
- Parameters
transforms (list of torch.nn.Module) – Transformations whose
forward
method yields the transformed value and the log- determinant of the applied transformation. All transformations must have the same dimension.
-
forward
(z, condition=None)[source]¶ Computes the outputs and log-detemrinants for the given samples after applying this flow’s transformations.
- Parameters
z (torch.Tensor [N, D]) – The input value (batch size N, dimensionality D).
condition (torch.Tensor [N, C]) – An additional condition vector on which the transforms are conditioned. Causes failure if any of the underlying transforms does not support conditioning.
- Returns
torch.Tensor [N, D] – The transformed values.
torch.Tensor [N] – The log-determinants of the transformation for all values.
-
Affine Transform¶
-
class
pyblaze.nn.
AffineTransform
(dim)[source]¶ An affine transformation may be used to transform an input variable linearly. It computes the following function for \(\mathbf{z} \in \mathbb{R}^D\):
\[f_{\mathbf{a}, \mathbf{b}}(\mathbf{z}) = \mathbf{a} \odot \mathbf{z} + \mathbf{b}\]with \(\mathbf{a} \in \mathbb{R}^D_+\) and \(\mathbf{b} \in \mathbb{R}^D\).
The log-determinant of its Jacobian is given as follows:
\[\sum_{k=1}^D{\log{a_k}}\]Although this transformation is theoretically invertible, the inverse function is not implemented at the moment.
-
__init__
(dim)[source]¶ Initializes a new affine transformation.
- Parameters
dim (int) – The dimension of the inputs to the function.
-
Planar Transform¶
-
class
pyblaze.nn.
PlanarTransform
(dim)[source]¶ A planar transformation may be used to split the input along a hyperplane. It was introduced in “Variational Inference with Normalizing Flows” (Rezende and Mohamed, 2015). It computes the following function for \(\mathbf{z} \in \mathbb{R}^D\) (although the planar transform was introduced for an arbitrary activation function \(\sigma\), this transform restricts the usage to \(tanh\)):
\[f_{\mathbf{u}, \mathbf{w}, b}(\mathbf{z}) = \mathbf{z} + \mathbf{u} \tanh(\mathbf{w}^T \mathbf{z} + b)\]with \(\mathbf{u}, \mathbf{w} \in \mathbb{R}^D\) and \(b \in \mathbb{R}\).
The log-determinant of its Jacobian is given as follows:
\[\log\left| 1 + \mathbf{u}^T ((1 - \tanh^2(\mathbf{w}^T \mathbf{z} + b))\mathbf{w}) \right|\]This transform is invertible for its outputs.
-
__init__
(dim)[source]¶ Initializes a new planar transformation.
- Parameters
dim (int) – The dimension of the inputs to the function.
-
Radial Transform¶
-
class
pyblaze.nn.
RadialTransform
(dim)[source]¶ A radial transformation may be used to apply radial contractions and expansions around a reference point. It was introduced in “Variational Inference with Normalizing Flows” (Rezende and Mohamed, 2015). It computes the following function for \(\mathbf{z} \in \mathbb{R}^D\):
\[f_{\mathbf{z}_0, \alpha, \beta}(\mathbf{z}) = \mathbf{z} + \beta h(\alpha, r) (\mathbf{z} - \mathbf{z}_0)\]with \(\mathbf{z}_0 \in \mathbb{R}^D\), \(\alpha \in \mathbb{R}^+\), \(\beta \in \mathbb{R}\), \(\mathbf{r} = ||\mathbf{z} - \mathbf{z}_0||_2\) and \(h(\alpha, r) = (\alpha + r)^{-1}\).
The log-determinant of its Jacobian is given as follows:
\[(D - 1) \log\left(1 + \beta h(\alpha, r)\right) + \log\left(1 + \beta h(\alpha, r) - \beta h^2(\alpha, r) r \right)\]This transform is invertible for its outputs, however, there does not exist a closed-form solution for computing the inverse in general.
-
__init__
(dim)[source]¶ Initializes a new planar transformation.
- Parameters
dim (int) – The dimension of the inputs to the function.
activation (torch.nn.Module, default: torch.nn.Tanh()) – The activation function to use. By default, \(\tanh\) is used.
-
Affine Coupling Transform 1D¶
-
class
pyblaze.nn.
AffineCouplingTransform1d
(dim, fixed_dim, net, constrain_scale=False)[source]¶ An affine coupling transforms the input by splitting it into two parts and transforming the second part by an arbitrary function depending on the first part. It was introduced in “Density Estimation Using Real NVP” (Dinh et. al, 2017). It computes the following function for \(\mathbf{z} \in \mathbb{R}^D\) and a dimension \(d < D\):
\[f_{\mathbf{\omega}_s, \mathbf{\omega}_m}(\mathbf{z}) = [\mathbf{z}_{1:d}, \mathbf{z}_{d+1:D} \odot \exp(g_{\mathbf{\omega}_s}(\mathbf{z}_{1:d})) + h_{\mathbf{\omega}_m}(\mathbf{z}_{1:d})]^T\]with \(g, h: \mathbb{R}^d \rightarrow \mathbb{R}^{D-d}\) being arbitrary parametrized functions (e.g. neural networks) computing the log-scale and the translation, respectively.
The log-determinant of its Jacobian is given as follows:
\[\sum_{k=1}^{D-d}{g_{\mathbf{\omega}_s}(\mathbf{z}_{1:d})}\]Additionally, this transform can be easily conditioned on another input variable \(\mathbf{x}\) by conditioning the functions \(g, h\) on it. This transform is invertible and the inverse computation will be added in the future.
Note
As only part of the input is transformed, consider using this class with the
reverse
flag set alternately.-
__init__
(dim, fixed_dim, net, constrain_scale=False)[source]¶ Initializes a new affine coupling transformation.
- Parameters
dim (int) – The dimensionality of the input.
fixed_dim (int) – The dimensionality of the input space that is not transformed. Must be smaller than the dimension.
net (torch.nn.Module [N, F] -> [N, F*2]) – An arbitrary neural network taking as input the fixed part of the input and outputting a mean and a log scale used for scaling and translating the affine part of the input, respectively, as a single tensor which will be split. In case this affine coupling is used with conditioning, the net’s input dimension should be modified accordingly (batch size N, fixed dimension F).
constrain_scale (bool, default: False) – Whether to constrain the scale parameter that the output is multiplied by. This should be set for deep normalizing flows where no batch normalization is used.
-
forward
(z, condition=None)[source]¶ Transforms the given input.
- Parameters
z (torch.Tensor [N, D]) – The given input (batch size N, dimensionality D).
condition (torch.Tensor [N, C]) – An optional tensor on which this layer’s net is conditioned. This value will be concatenated with the part of
z
that is passed to this layer’s net (condition dimension C).
- Returns
torch.Tensor [N, D] – The transformed input.
torch.Tensor [N] – The log-determinants of the Jacobian evaluated at z.
-
Masked Autoregressive Transform 1D¶
-
class
pyblaze.nn.
MaskedAutoregressiveTransform1d
(dim, *hidden_dims, activation=LeakyReLU(negative_slope=0.01), constrain_scale=False)[source]¶ 1-dimensional Masked Autogressive Transform as introduced in Masked Autoregressive Flow for Density Estimation (Papamakarios et al., 2018).
-
__init__
(dim, *hidden_dims, activation=LeakyReLU(negative_slope=0.01), constrain_scale=False)[source]¶ Initializes a new MAF transform that is backed by a
pyblaze.nn.MADE
model.- Parameters
dim (int) – The dimension of the inputs.
hidden_dims (varargs of int) – The hidden dimensions of the MADE model.
activation (torch.nn.Module, default: torch.nn.LeakyReLU()) – The activation function to use in the MADE model.
constrain_scale (bool, default: False) – Whether to constrain the scale parameter that the output is multiplied by. This should be set for deep normalizing flows where no batch normalization is used.
-
BatchNorm Transform 1D¶
-
class
pyblaze.nn.
BatchNormTransform1d
(dim, eps=1e-05, momentum=0.1)[source]¶ 1-dimensional Batch Normalization layer for stabilizing deep normalizing flows. It was first introduced in Density Estimation Using Real NVP (Dinh et al., 2017).
-
__init__
(dim, eps=1e-05, momentum=0.1)[source]¶ Initializes a new batch normalization layer for one-dimensional vectors of the given dimension.
- Parameters
dim (int) – The dimension of the inputs.
eps (float, default: 1e-5) – A small value added in the denominator for numerical stability.
momentum (float, default: 0.1) – Value used for calculating running average statistics.
-
forward
(z)[source]¶ Transforms the given input.
Note
During testing, inputs that highly differ from the inputs seen during testing, this module is generally prone to outputting non-finite float values. In that case, these inputs are considered to be “impossible” to observe: the transformed output is set to all zeros and the log-determinant is set to
-inf
.- Parameters
z (torch.Tensor [N, D]) – The given input (batch size N, dimensionality D).
- Returns
torch.Tensor [N, D] – The transformed input.
torch.Tensor [N] – The log-determinants of the Jacobian evaluated at z.
-