Data

import pyblaze.nn as xnn

The data module provides utilities for working with PyTorch datasets.

Extensions

Deterministic Splitting

pyblaze.nn.data.extensions.split(self, condition)[source]

Splits the dataset according to the given boolean condition. When pyblaze.nn is imported, this method is available on all torch.utils.data.Dataset objects.

Attention

Do not call this method on iterable datasets.

Parameters

condition (callable (object) -> bool) – The condition which splits the dataset.

Returns

  • torch.utils.data.Subset – The dataset with the items for which the condition evaluated to true.

  • torch.utils.data.Subset – The dataset with the items for which the condition evaluated to false.

Random Splitting

pyblaze.nn.data.extensions.random_split(self, *sizes, seed=None)[source]

Splits the dataset randomly into multiple subsets. When pyblaze.nn is imported, this method is available on all torch.utils.data.Dataset objects.

Attention

Do not call this method on iterable datasets.

Parameters
  • sizes (variadic argument of float) – The sizes of the splits, given as fraction of the size of the dataset. Hence, the sizes must sum to 1.

  • seed (int, default: None) – If given, uses the specified seed to sample the indices for each subset.

Returns

The random splits of this dataset.

Return type

list of torch.utils.data.Subset

Data Loader

pyblaze.nn.data.extensions.loader(self, **kwargs)[source]

Returns a data loader for this dataset. If the dataset defines a collate_fn function, this is automatically set. When pyblaze.nn is imported, this method is available on all torch.utils.data.Dataset objects.

Parameters

kwargs (keyword arguments) – Paramaters passed directly to the DataLoader.

Returns

The data loader with the specified attributes.

Return type

torch.utils.data.DataLoader

Datasets

Transform

class pyblaze.nn.NoiseDataset(latent_dim=2, distribution=None)[source]

Infinite dataset for generating noise from a given probability distribution. Usually to be used with generative adversarial networks.

__init__(latent_dim=2, distribution=None)[source]

Initializes a new dataset where noise is sampled from the given distribution. If no distribution is given, noise is sampled from a multivariate normal distribution with a certain latent dimension.

Parameters
  • latent_dim (int) – The latent dimension for the Normal Distribution the noise is sampled from.

  • distribution (torch.distributions.Distribution) – The noise type to use. Overrides setting of latent_dim if specified.

Noise

class pyblaze.nn.NoiseDataset(latent_dim=2, distribution=None)[source]

Infinite dataset for generating noise from a given probability distribution. Usually to be used with generative adversarial networks.

__init__(latent_dim=2, distribution=None)[source]

Initializes a new dataset where noise is sampled from the given distribution. If no distribution is given, noise is sampled from a multivariate normal distribution with a certain latent dimension.

Parameters
  • latent_dim (int) – The latent dimension for the Normal Distribution the noise is sampled from.

  • distribution (torch.distributions.Distribution) – The noise type to use. Overrides setting of latent_dim if specified.

class pyblaze.nn.LabeledNoiseDataset(latent_dim=2, num_classes=10, distribution=None, categorical=None)[source]

Infinite dataset for generating noise from a given probability distribution. Usually to be used with generative adversarial networks conditioned on class labels.

__init__(latent_dim=2, num_classes=10, distribution=None, categorical=None)[source]

Initializes a new dataset where noise and a label is sampled from the given distribution. If no distribution is given, noise is sampled from a multivariate normal distribution with a certain latent dimension and the label is sampled from a categorical distribution.

Parameters
  • latent_dim (int) – The latent dimension for the Normal Distribution the noise is sampled from.

  • num_classes (int) – Number of classes for the Categorical Distribution the label is sampled from.

  • distribution (torch.distributions.Distribution) – The noise type to use. Overrides setting of latent_dim if specified.

  • categorical (torch.distributions.Distribution) – The distribution to sample labels from. Overrides setting of num_classes if specified.

Data Loaders

Zipping

class pyblaze.nn.ZipDataLoader(lhs_loader, rhs_loader, lhs_count=1, rhs_count=1)[source]

A data loader that zips together two underlying data loaders. The data loaders must be sampling the same batch size and drop_last must be set to True on data loaders that sample from a fixed-size dataset. Whenever one of the data loaders has a fixed size, this data loader defines a length. This length is given as the minimum of the both lengths divided by their respective counts.

A common use case for this class are Wasserstein GANs where the critic is trained for multiple iterations for each data batch.

__init__(lhs_loader, rhs_loader, lhs_count=1, rhs_count=1)[source]

Initializes a new data loader.

Parameters
  • lhs_dataset (torch.utils.data.DataLoader) – The dataset to sample from for the first item of the data tuple.

  • rhs_dataset (torch.utils.data.DataLoader) – The dataset to sample from for the second item of the data tuple.

  • lhs_count (int) – The number of items to sample for the first item of the data tuple.

  • rhs_count (int) – The number of items to sample for the second item of the data tuple.