skorch.classifier¶
NeuralNet subclasses for classification tasks.
-
class
skorch.classifier.
NeuralNetBinaryClassifier
(module, *args, criterion=<class 'torch.nn.modules.loss.BCEWithLogitsLoss'>, train_split=<skorch.dataset.CVSplit object>, threshold=0.5, **kwargs)[source]¶ NeuralNet for binary classification tasks
Use this specifically if you have a binary classification task, with input data X and target y. y must be 1d.
In addition to the parameters listed below, there are parameters with specific prefixes that are handled separately. To illustrate this, here is an example:
>>> net = NeuralNet( ... ..., ... optimizer=torch.optimizer.SGD, ... optimizer__momentum=0.95, ...)
This way, when
optimizer
is initialized,NeuralNet
will take care of setting themomentum
parameter to 0.95.(Note that the double underscore notation in
optimizer__momentum
means that the parametermomentum
should be set on the objectoptimizer
. This is the same semantic as used by sklearn.)Furthermore, this allows to change those parameters later:
net.set_params(optimizer__momentum=0.99)
This can be useful when you want to change certain parameters using a callback, when using the net in an sklearn grid search, etc.
By default an
EpochTimer
,BatchScoring
(for both training and validation datasets), andPrintLog
callbacks are installed for the user’s convenience.Parameters: - module : torch module (class or instance)
A PyTorch
Module
. In general, the uninstantiated class should be passed, although instantiated modules will also work.- criterion : torch criterion (class, default=torch.nn.BCEWithLogitsLoss)
Binary cross entropy loss with logits. Note that the module should return the logit of probabilities with shape (batch_size, ).
- threshold : float (default=0.5)
Probabilities above this threshold is classified as 1.
threshold
is used bypredict
andpredict_proba
for classification.- optimizer : torch optim (class, default=torch.optim.SGD)
The uninitialized optimizer (update rule) used to optimize the module
- lr : float (default=0.01)
Learning rate passed to the optimizer. You may use
lr
instead of usingoptimizer__lr
, which would result in the same outcome.- max_epochs : int (default=10)
The number of epochs to train for each
fit
call. Note that you may keyboard-interrupt training at any time.- batch_size : int (default=128)
Mini-batch size. Use this instead of setting
iterator_train__batch_size
anditerator_test__batch_size
, which would result in the same outcome. Ifbatch_size
is -1, a single batch with all the data will be used during training and validation.- iterator_train : torch DataLoader
The default PyTorch
DataLoader
used for training data.- iterator_valid : torch DataLoader
The default PyTorch
DataLoader
used for validation and test data, i.e. during inference.- dataset : torch Dataset (default=skorch.dataset.Dataset)
The dataset is necessary for the incoming data to work with pytorch’s
DataLoader
. It has to implement the__len__
and__getitem__
methods. The provided dataset should be capable of dealing with a lot of data types out of the box, so only change this if your data is not supported. You should generally pass the uninitializedDataset
class and define additional arguments to X and y by prefixing them withdataset__
. It is also possible to pass an initialzedDataset
, in which case no additional arguments may be passed.- train_split : None or callable (default=skorch.dataset.CVSplit(5))
If None, there is no train/validation split. Else, train_split should be a function or callable that is called with X and y data and should return the tuple
dataset_train, dataset_valid
. The validation data may be None.- callbacks : None or list of Callback instances (default=None)
More callbacks, in addition to those returned by
get_default_callbacks
. Each callback should inherit fromCallback
. If notNone
, a list of callbacks is expected where the callback names are inferred from the class name. Name conflicts are resolved by appending a count suffix starting with 1, e.g.EpochScoring_1
. Alternatively, a tuple(name, callback)
can be passed, wherename
should be unique. Callbacks may or may not be instantiated. The callback name can be used to set parameters on specific callbacks (e.g., for the callback with name'print_log'
, usenet.set_params(callbacks__print_log__keys_ignored=['epoch', 'train_loss'])
).- warm_start : bool (default=False)
Whether each fit call should lead to a re-initialization of the module (cold start) or whether the module should be trained further (warm start).
- verbose : int (default=1)
Control the verbosity level.
- device : str, torch.device (default=’cpu’)
The compute device to be used. If set to ‘cuda’, data in torch tensors will be pushed to cuda tensors before being sent to the module.
Attributes: - prefixes_ : list of str
Contains the prefixes to special parameters. E.g., since there is the
'module'
prefix, it is possible to set parameters like so:NeuralNet(..., optimizer__momentum=0.95)
.- cuda_dependent_attributes_ : list of str
Contains a list of all attributes whose values depend on a CUDA device. If a
NeuralNet
trained with a CUDA-enabled device is unpickled on a machine without CUDA or with CUDA disabled, the listed attributes are mapped to CPU. Expand this list if you want to add other cuda-dependent attributes.- initialized_ : bool
Whether the
NeuralNet
was initialized.- module_ : torch module (instance)
The instantiated module.
- criterion_ : torch criterion (instance)
The instantiated criterion.
- callbacks_ : list of tuples
The complete (i.e. default and other), initialized callbacks, in a tuple with unique names.
Methods
check_data
(X, y)evaluation_step
(Xi[, training])Perform a forward step to produce the output used for prediction and scoring. fit
(X, y, **fit_params)See NeuralNet.fit
.fit_loop
(X[, y, epochs])The proper fit loop. forward
(X[, training, device])Gather and concatenate the output from forward call with input data. forward_iter
(X[, training, device])Yield outputs of module forward calls on each batch of data. get_dataset
(X[, y])Get a dataset that contains the input data and is passed to the iterator. get_iterator
(dataset[, training])Get an iterator that allows to loop over the batches of the given data. get_loss
(y_pred, y_true[, X, training])Return the loss for this batch. get_split_datasets
(X[, y])Get internal train and validation datasets. get_train_step_accumulator
()Return the train step accumulator. infer
(x, **fit_params)Perform a single inference step on a batch of data. initialize
()Initializes all components of the NeuralNet
and returns self.initialize_callbacks
()Initializes all callbacks and save the result in the callbacks_
attribute.initialize_criterion
()Initializes the criterion. initialize_history
()Initializes the history. initialize_module
()Initializes the module. initialize_optimizer
([triggered_directly])Initialize the model optimizer. load_history
(f)Load the history of a NeuralNet
from a json file.load_params
([f, f_params, f_optimizer, …])Loads the the module’s parameters, history, and optimizer, not the whole object. notify
(method_name, **cb_kwargs)Call the callback method specified in method_name
with parameters specified incb_kwargs
.on_batch_begin
(net[, Xi, yi, training])on_epoch_begin
(net[, dataset_train, …])on_epoch_end
(net[, dataset_train, dataset_valid])on_train_begin
(net[, X, y])on_train_end
(net[, X, y])partial_fit
(X[, y, classes])Fit the module. predict
(X)Where applicable, return class labels for samples in X. predict_proba
(X)Where applicable, return probability estimates for samples. save_history
(f)Saves the history of NeuralNet
as a json file.save_params
([f, f_params, f_optimizer, …])Saves the module’s parameters, history, and optimizer, not the whole object. set_params
(**kwargs)Set the parameters of this class. train_step
(Xi, yi, **fit_params)Prepares a loss function callable and pass it to the optimizer, hence performing one optimization step. train_step_single
(Xi, yi, **fit_params)Compute y_pred, loss value, and update net’s gradients. validation_step
(Xi, yi, **fit_params)Perform a forward step using batched data and return the resulting loss. get_default_callbacks get_params initialize_virtual_params on_batch_end on_grad_computed -
fit
(X, y, **fit_params)[source]¶ See
NeuralNet.fit
.In contrast to
NeuralNet.fit
,y
is non-optional to avoid mistakenly forgetting abouty
. However,y
can be set toNone
in case it is derived dynamically fromX
.
-
predict
(X)[source]¶ Where applicable, return class labels for samples in X.
If the module’s forward method returns multiple outputs as a tuple, it is assumed that the first output contains the relevant information and the other values are ignored. If all values are relevant, consider using
forward()
instead.Parameters: - X : input data, compatible with skorch.dataset.Dataset
By default, you should be able to pass:
- numpy arrays
- torch tensors
- pandas DataFrame or Series
- scipy sparse CSR matrices
- a dictionary of the former three
- a list/tuple of the former three
- a Dataset
If this doesn’t work with your data, you have to pass a
Dataset
that can deal with the data.
Returns: - y_pred : numpy ndarray
-
predict_proba
(X)[source]¶ Where applicable, return probability estimates for samples.
If the module’s forward method returns multiple outputs as a tuple, it is assumed that the first output contains the relevant information and the other values are ignored. If all values are relevant, consider using
forward()
instead.Parameters: - X : input data, compatible with skorch.dataset.Dataset
By default, you should be able to pass:
- numpy arrays
- torch tensors
- pandas DataFrame or Series
- scipy sparse CSR matrices
- a dictionary of the former three
- a list/tuple of the former three
- a Dataset
If this doesn’t work with your data, you have to pass a
Dataset
that can deal with the data.
Returns: - y_proba : numpy ndarray
-
class
skorch.classifier.
NeuralNetClassifier
(module, *args, criterion=<class 'torch.nn.modules.loss.NLLLoss'>, train_split=<skorch.dataset.CVSplit object>, **kwargs)[source]¶ NeuralNet for classification tasks
Use this specifically if you have a standard classification task, with input data X and target y.
In addition to the parameters listed below, there are parameters with specific prefixes that are handled separately. To illustrate this, here is an example:
>>> net = NeuralNet( ... ..., ... optimizer=torch.optimizer.SGD, ... optimizer__momentum=0.95, ...)
This way, when
optimizer
is initialized,NeuralNet
will take care of setting themomentum
parameter to 0.95.(Note that the double underscore notation in
optimizer__momentum
means that the parametermomentum
should be set on the objectoptimizer
. This is the same semantic as used by sklearn.)Furthermore, this allows to change those parameters later:
net.set_params(optimizer__momentum=0.99)
This can be useful when you want to change certain parameters using a callback, when using the net in an sklearn grid search, etc.
By default an
EpochTimer
,BatchScoring
(for both training and validation datasets), andPrintLog
callbacks are installed for the user’s convenience.Parameters: - module : torch module (class or instance)
A PyTorch
Module
. In general, the uninstantiated class should be passed, although instantiated modules will also work.- criterion : torch criterion (class, default=torch.nn.NLLLoss)
Negative log likelihood loss. Note that the module should return probabilities, the log is applied during
get_loss
.- optimizer : torch optim (class, default=torch.optim.SGD)
The uninitialized optimizer (update rule) used to optimize the module
- lr : float (default=0.01)
Learning rate passed to the optimizer. You may use
lr
instead of usingoptimizer__lr
, which would result in the same outcome.- max_epochs : int (default=10)
The number of epochs to train for each
fit
call. Note that you may keyboard-interrupt training at any time.- batch_size : int (default=128)
Mini-batch size. Use this instead of setting
iterator_train__batch_size
anditerator_test__batch_size
, which would result in the same outcome. Ifbatch_size
is -1, a single batch with all the data will be used during training and validation.- iterator_train : torch DataLoader
The default PyTorch
DataLoader
used for training data.- iterator_valid : torch DataLoader
The default PyTorch
DataLoader
used for validation and test data, i.e. during inference.- dataset : torch Dataset (default=skorch.dataset.Dataset)
The dataset is necessary for the incoming data to work with pytorch’s
DataLoader
. It has to implement the__len__
and__getitem__
methods. The provided dataset should be capable of dealing with a lot of data types out of the box, so only change this if your data is not supported. You should generally pass the uninitializedDataset
class and define additional arguments to X and y by prefixing them withdataset__
. It is also possible to pass an initialzedDataset
, in which case no additional arguments may be passed.- train_split : None or callable (default=skorch.dataset.CVSplit(5))
If None, there is no train/validation split. Else, train_split should be a function or callable that is called with X and y data and should return the tuple
dataset_train, dataset_valid
. The validation data may be None.- callbacks : None or list of Callback instances (default=None)
More callbacks, in addition to those returned by
get_default_callbacks
. Each callback should inherit fromCallback
. If notNone
, a list of callbacks is expected where the callback names are inferred from the class name. Name conflicts are resolved by appending a count suffix starting with 1, e.g.EpochScoring_1
. Alternatively, a tuple(name, callback)
can be passed, wherename
should be unique. Callbacks may or may not be instantiated. The callback name can be used to set parameters on specific callbacks (e.g., for the callback with name'print_log'
, usenet.set_params(callbacks__print_log__keys_ignored=['epoch', 'train_loss'])
).- warm_start : bool (default=False)
Whether each fit call should lead to a re-initialization of the module (cold start) or whether the module should be trained further (warm start).
- verbose : int (default=1)
Control the verbosity level.
- device : str, torch.device (default=’cpu’)
The compute device to be used. If set to ‘cuda’, data in torch tensors will be pushed to cuda tensors before being sent to the module.
Attributes: - prefixes_ : list of str
Contains the prefixes to special parameters. E.g., since there is the
'module'
prefix, it is possible to set parameters like so:NeuralNet(..., optimizer__momentum=0.95)
.- cuda_dependent_attributes_ : list of str
Contains a list of all attributes whose values depend on a CUDA device. If a
NeuralNet
trained with a CUDA-enabled device is unpickled on a machine without CUDA or with CUDA disabled, the listed attributes are mapped to CPU. Expand this list if you want to add other cuda-dependent attributes.- initialized_ : bool
Whether the
NeuralNet
was initialized.- module_ : torch module (instance)
The instantiated module.
- criterion_ : torch criterion (instance)
The instantiated criterion.
- callbacks_ : list of tuples
The complete (i.e. default and other), initialized callbacks, in a tuple with unique names.
Methods
check_data
(X, y)evaluation_step
(Xi[, training])Perform a forward step to produce the output used for prediction and scoring. fit
(X, y, **fit_params)See NeuralNet.fit
.fit_loop
(X[, y, epochs])The proper fit loop. forward
(X[, training, device])Gather and concatenate the output from forward call with input data. forward_iter
(X[, training, device])Yield outputs of module forward calls on each batch of data. get_dataset
(X[, y])Get a dataset that contains the input data and is passed to the iterator. get_iterator
(dataset[, training])Get an iterator that allows to loop over the batches of the given data. get_loss
(y_pred, y_true, *args, **kwargs)Return the loss for this batch. get_split_datasets
(X[, y])Get internal train and validation datasets. get_train_step_accumulator
()Return the train step accumulator. infer
(x, **fit_params)Perform a single inference step on a batch of data. initialize
()Initializes all components of the NeuralNet
and returns self.initialize_callbacks
()Initializes all callbacks and save the result in the callbacks_
attribute.initialize_criterion
()Initializes the criterion. initialize_history
()Initializes the history. initialize_module
()Initializes the module. initialize_optimizer
([triggered_directly])Initialize the model optimizer. load_history
(f)Load the history of a NeuralNet
from a json file.load_params
([f, f_params, f_optimizer, …])Loads the the module’s parameters, history, and optimizer, not the whole object. notify
(method_name, **cb_kwargs)Call the callback method specified in method_name
with parameters specified incb_kwargs
.on_batch_begin
(net[, Xi, yi, training])on_epoch_begin
(net[, dataset_train, …])on_epoch_end
(net[, dataset_train, dataset_valid])on_train_begin
(net[, X, y])on_train_end
(net[, X, y])partial_fit
(X[, y, classes])Fit the module. predict
(X)Where applicable, return class labels for samples in X. predict_proba
(X)Where applicable, return probability estimates for samples. save_history
(f)Saves the history of NeuralNet
as a json file.save_params
([f, f_params, f_optimizer, …])Saves the module’s parameters, history, and optimizer, not the whole object. set_params
(**kwargs)Set the parameters of this class. train_step
(Xi, yi, **fit_params)Prepares a loss function callable and pass it to the optimizer, hence performing one optimization step. train_step_single
(Xi, yi, **fit_params)Compute y_pred, loss value, and update net’s gradients. validation_step
(Xi, yi, **fit_params)Perform a forward step using batched data and return the resulting loss. get_default_callbacks get_params initialize_virtual_params on_batch_end on_grad_computed -
fit
(X, y, **fit_params)[source]¶ See
NeuralNet.fit
.In contrast to
NeuralNet.fit
,y
is non-optional to avoid mistakenly forgetting abouty
. However,y
can be set toNone
in case it is derived dynamically fromX
.
-
get_loss
(y_pred, y_true, *args, **kwargs)[source]¶ Return the loss for this batch.
Parameters: - y_pred : torch tensor
Predicted target values
- y_true : torch tensor
True target values.
- X : input data, compatible with skorch.dataset.Dataset
By default, you should be able to pass:
- numpy arrays
- torch tensors
- pandas DataFrame or Series
- scipy sparse CSR matrices
- a dictionary of the former three
- a list/tuple of the former three
- a Dataset
If this doesn’t work with your data, you have to pass a
Dataset
that can deal with the data.- training : bool (default=False)
Whether train mode should be used or not.
-
predict
(X)[source]¶ Where applicable, return class labels for samples in X.
If the module’s forward method returns multiple outputs as a tuple, it is assumed that the first output contains the relevant information and the other values are ignored. If all values are relevant, consider using
forward()
instead.Parameters: - X : input data, compatible with skorch.dataset.Dataset
By default, you should be able to pass:
- numpy arrays
- torch tensors
- pandas DataFrame or Series
- scipy sparse CSR matrices
- a dictionary of the former three
- a list/tuple of the former three
- a Dataset
If this doesn’t work with your data, you have to pass a
Dataset
that can deal with the data.
Returns: - y_pred : numpy ndarray
-
predict_proba
(X)[source]¶ Where applicable, return probability estimates for samples.
If the module’s forward method returns multiple outputs as a tuple, it is assumed that the first output contains the relevant information and the other values are ignored. If all values are relevant, consider using
forward()
instead.Parameters: - X : input data, compatible with skorch.dataset.Dataset
By default, you should be able to pass:
- numpy arrays
- torch tensors
- pandas DataFrame or Series
- scipy sparse CSR matrices
- a dictionary of the former three
- a list/tuple of the former three
- a Dataset
If this doesn’t work with your data, you have to pass a
Dataset
that can deal with the data.
Returns: - y_proba : numpy ndarray