Data¶

import pyblaze.nn as xnn

The data module provides utilities for working with PyTorch datasets.

Contents

Extensions
Datasets
Data Loaders

Extensions ¶

Deterministic Splitting¶

pyblaze.nn.data.extensions.split(self, condition)[source]¶

Splits the dataset according to the given boolean condition. When pyblaze.nn is imported, this method is available on all torch.utils.data.Dataset objects.

Attention

Do not call this method on iterable datasets.

Parameters

condition (callable (object) -> bool) – The condition which splits the dataset.

Returns

torch.utils.data.Subset – The dataset with the items for which the condition evaluated to true.
torch.utils.data.Subset – The dataset with the items for which the condition evaluated to false.

Random Splitting¶

pyblaze.nn.data.extensions.random_split(self, *sizes, seed=None)[source]¶

Splits the dataset randomly into multiple subsets. When pyblaze.nn is imported, this method is available on all torch.utils.data.Dataset objects.

Attention

Do not call this method on iterable datasets.

Parameters

sizes (variadic argument of float) – The sizes of the splits, given as fraction of the size of the dataset. Hence, the sizes must sum to 1.
seed (int, default: None) – If given, uses the specified seed to sample the indices for each subset.

Returns

The random splits of this dataset.

Return type

list of torch.utils.data.Subset

Data Loader¶

pyblaze.nn.data.extensions.loader(self, **kwargs)[source]¶

Returns a data loader for this dataset. If the dataset defines a collate_fn function, this is automatically set. When pyblaze.nn is imported, this method is available on all torch.utils.data.Dataset objects.

Parameters: kwargs (keyword arguments) – Paramaters passed directly to the DataLoader.
Returns: The data loader with the specified attributes.
Return type: torch.utils.data.DataLoader

Datasets ¶

Transform¶

class pyblaze.nn.NoiseDataset(latent_dim=2, distribution=None)[source]¶

Infinite dataset for generating noise from a given probability distribution. Usually to be used with generative adversarial networks.

__init__(latent_dim=2, distribution=None)[source]¶

Initializes a new dataset where noise is sampled from the given distribution. If no distribution is given, noise is sampled from a multivariate normal distribution with a certain latent dimension.

Parameters

latent_dim (int) – The latent dimension for the Normal Distribution the noise is sampled from.
distribution (torch.distributions.Distribution) – The noise type to use. Overrides setting of latent_dim if specified.

Noise¶

class pyblaze.nn.NoiseDataset(latent_dim=2, distribution=None)[source]¶

Infinite dataset for generating noise from a given probability distribution. Usually to be used with generative adversarial networks.

__init__(latent_dim=2, distribution=None)[source]¶

Initializes a new dataset where noise is sampled from the given distribution. If no distribution is given, noise is sampled from a multivariate normal distribution with a certain latent dimension.

Parameters

latent_dim (int) – The latent dimension for the Normal Distribution the noise is sampled from.
distribution (torch.distributions.Distribution) – The noise type to use. Overrides setting of latent_dim if specified.

class pyblaze.nn.LabeledNoiseDataset(latent_dim=2, num_classes=10, distribution=None, categorical=None)[source]¶

Infinite dataset for generating noise from a given probability distribution. Usually to be used with generative adversarial networks conditioned on class labels.

__init__(latent_dim=2, num_classes=10, distribution=None, categorical=None)[source]¶

Initializes a new dataset where noise and a label is sampled from the given distribution. If no distribution is given, noise is sampled from a multivariate normal distribution with a certain latent dimension and the label is sampled from a categorical distribution.

Parameters

latent_dim (int) – The latent dimension for the Normal Distribution the noise is sampled from.
num_classes (int) – Number of classes for the Categorical Distribution the label is sampled from.
distribution (torch.distributions.Distribution) – The noise type to use. Overrides setting of latent_dim if specified.
categorical (torch.distributions.Distribution) – The distribution to sample labels from. Overrides setting of num_classes if specified.

Data Loaders ¶

Zipping¶

class pyblaze.nn.ZipDataLoader(lhs_loader, rhs_loader, lhs_count=1, rhs_count=1)[source]¶

A data loader that zips together two underlying data loaders. The data loaders must be sampling the same batch size and drop_last must be set to True on data loaders that sample from a fixed-size dataset. Whenever one of the data loaders has a fixed size, this data loader defines a length. This length is given as the minimum of the both lengths divided by their respective counts.

A common use case for this class are Wasserstein GANs where the critic is trained for multiple iterations for each data batch.

__init__(lhs_loader, rhs_loader, lhs_count=1, rhs_count=1)[source]¶

Initializes a new data loader.

Parameters

lhs_dataset (torch.utils.data.DataLoader) – The dataset to sample from for the first item of the data tuple.
rhs_dataset (torch.utils.data.DataLoader) – The dataset to sample from for the second item of the data tuple.
lhs_count (int) – The number of items to sample for the first item of the data tuple.
rhs_count (int) – The number of items to sample for the second item of the data tuple.

Data¶

Extensions¶

Deterministic Splitting¶

Random Splitting¶

Data Loader¶

Datasets¶

Transform¶

Noise¶

Data Loaders¶

Zipping¶

Extensions ¶

Datasets ¶

Data Loaders ¶