Data¶
import pyblaze.nn as xnn
The data module provides utilities for working with PyTorch datasets.
Contents
Extensions¶
Deterministic Splitting¶
-
pyblaze.nn.data.extensions.
split
(self, condition)[source]¶ Splits the dataset according to the given boolean condition. When
pyblaze.nn
is imported, this method is available on alltorch.utils.data.Dataset
objects.Attention
Do not call this method on iterable datasets.
- Parameters
condition (callable (object) -> bool) – The condition which splits the dataset.
- Returns
torch.utils.data.Subset – The dataset with the items for which the condition evaluated to true.
torch.utils.data.Subset – The dataset with the items for which the condition evaluated to false.
Random Splitting¶
-
pyblaze.nn.data.extensions.
random_split
(self, *sizes, seed=None)[source]¶ Splits the dataset randomly into multiple subsets. When
pyblaze.nn
is imported, this method is available on alltorch.utils.data.Dataset
objects.Attention
Do not call this method on iterable datasets.
- Parameters
sizes (variadic argument of float) – The sizes of the splits, given as fraction of the size of the dataset. Hence, the sizes must sum to 1.
seed (int, default: None) – If given, uses the specified seed to sample the indices for each subset.
- Returns
The random splits of this dataset.
- Return type
list of torch.utils.data.Subset
Data Loader¶
-
pyblaze.nn.data.extensions.
loader
(self, **kwargs)[source]¶ Returns a data loader for this dataset. If the dataset defines a
collate_fn
function, this is automatically set. Whenpyblaze.nn
is imported, this method is available on alltorch.utils.data.Dataset
objects.- Parameters
kwargs (keyword arguments) – Paramaters passed directly to the DataLoader.
- Returns
The data loader with the specified attributes.
- Return type
torch.utils.data.DataLoader
Datasets¶
Transform¶
-
class
pyblaze.nn.
NoiseDataset
(latent_dim=2, distribution=None)[source]¶ Infinite dataset for generating noise from a given probability distribution. Usually to be used with generative adversarial networks.
-
__init__
(latent_dim=2, distribution=None)[source]¶ Initializes a new dataset where noise is sampled from the given distribution. If no distribution is given, noise is sampled from a multivariate normal distribution with a certain latent dimension.
- Parameters
latent_dim (int) – The latent dimension for the Normal Distribution the noise is sampled from.
distribution (torch.distributions.Distribution) – The noise type to use. Overrides setting of latent_dim if specified.
-
Noise¶
-
class
pyblaze.nn.
NoiseDataset
(latent_dim=2, distribution=None)[source]¶ Infinite dataset for generating noise from a given probability distribution. Usually to be used with generative adversarial networks.
-
__init__
(latent_dim=2, distribution=None)[source]¶ Initializes a new dataset where noise is sampled from the given distribution. If no distribution is given, noise is sampled from a multivariate normal distribution with a certain latent dimension.
- Parameters
latent_dim (int) – The latent dimension for the Normal Distribution the noise is sampled from.
distribution (torch.distributions.Distribution) – The noise type to use. Overrides setting of latent_dim if specified.
-
-
class
pyblaze.nn.
LabeledNoiseDataset
(latent_dim=2, num_classes=10, distribution=None, categorical=None)[source]¶ Infinite dataset for generating noise from a given probability distribution. Usually to be used with generative adversarial networks conditioned on class labels.
-
__init__
(latent_dim=2, num_classes=10, distribution=None, categorical=None)[source]¶ Initializes a new dataset where noise and a label is sampled from the given distribution. If no distribution is given, noise is sampled from a multivariate normal distribution with a certain latent dimension and the label is sampled from a categorical distribution.
- Parameters
latent_dim (int) – The latent dimension for the Normal Distribution the noise is sampled from.
num_classes (int) – Number of classes for the Categorical Distribution the label is sampled from.
distribution (torch.distributions.Distribution) – The noise type to use. Overrides setting of latent_dim if specified.
categorical (torch.distributions.Distribution) – The distribution to sample labels from. Overrides setting of num_classes if specified.
-
Data Loaders¶
Zipping¶
-
class
pyblaze.nn.
ZipDataLoader
(lhs_loader, rhs_loader, lhs_count=1, rhs_count=1)[source]¶ A data loader that zips together two underlying data loaders. The data loaders must be sampling the same batch size and
drop_last
must be set to True on data loaders that sample from a fixed-size dataset. Whenever one of the data loaders has a fixed size, this data loader defines a length. This length is given as the minimum of the both lengths divided by their respective counts.A common use case for this class are Wasserstein GANs where the critic is trained for multiple iterations for each data batch.
-
__init__
(lhs_loader, rhs_loader, lhs_count=1, rhs_count=1)[source]¶ Initializes a new data loader.
- Parameters
lhs_dataset (torch.utils.data.DataLoader) – The dataset to sample from for the first item of the data tuple.
rhs_dataset (torch.utils.data.DataLoader) – The dataset to sample from for the second item of the data tuple.
lhs_count (int) – The number of items to sample for the first item of the data tuple.
rhs_count (int) – The number of items to sample for the second item of the data tuple.
-