climb.tool.impl.data_suite.models package

Submodules

climb.tool.impl.data_suite.models.base_model module

class climb.tool.impl.data_suite.models.base_model.MyDataset(data, targets, transform=None)[source]

Bases: Dataset

class climb.tool.impl.data_suite.models.base_model.Net(dim)[source]

Bases: Module

forward(x)[source]

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class climb.tool.impl.data_suite.models.base_model.benchmark_trainer(model, device)[source]

Bases: object

fit(train_loader, optimizer, epochs)[source]

For each epoch, we iterate through the training data & train the model with the train loop

Parameters:
  • train_loader – the training data loader

  • optimizer – The optimizer to use for training.

  • epochs – number of epochs to train for

predict(test_loader, mc_samples=None)[source]

> This function defines a generic prediction function whether we use MC sampling or not

Parameters:
  • test_loader – the test data loader

  • mc_samples – number of Monte Carlo samples to take

Returns:

The means and standard deviations of the predictions.

climb.tool.impl.data_suite.models.base_model.enable_dropout(model)[source]

climb.tool.impl.data_suite.models.benchmarks module

climb.tool.impl.data_suite.models.benchmarks.comparison_methods(x_train, y_train, x_test, y_test, inlier_ids, df_inlier, model_type, return_ids=True, seed=42)[source]

> This function takes in the training and testing data, the inlier ids, the dataframe of inliers, the model type, and a seed. It then fits the specific comparison model and returns the uncertainty scores

Parameters:
  • x_train – training data

  • y_train – the training data

  • x_test – the test data

  • y_test – the true values of the test set

  • inlier_ids – the indices of the inliers

  • df_inlier – the dataframe of inliers

  • model_type – the type of model to use. Can be one of the following:

  • return_ids – If True, returns the ids of the most uncertain and least uncertain samples. If False,

returns the uncertainty score. Defaults to True

seed: random seed.

Returns:

The uncertainty score

climb.tool.impl.data_suite.models.benchmarks.conformal(x_train, y_train, x_test, y_test, inlier_ids)[source]
climb.tool.impl.data_suite.models.benchmarks.sample_copula(x_train, y_train, x_test, y_test, inlier_ids)[source]
climb.tool.impl.data_suite.models.benchmarks.uncertainty_benchmark(x_train, y_train, x_test, y_test, y_test_ids, ids, model_type, wandb_dict, conformal_dict=None)[source]

> This function takes in a model type, trains the model. It peforms the full uncertainty benchmarking, hence besides training the model, it also computes uncertainty metrics and OOD metrics. That we would log for the synthetic experiment to wandb.

Parameters:
  • x_train – the training data

  • y_train – the training labels

  • x_test – the test data

  • y_test – the true values of the test set

  • y_test_ids – the true labels of the test set

  • ids – the indices of the inliers in the test set

  • model_type – the type of model to use. Can be one of the following:

  • wandb_dict – a dictionary that will be used to store the results of the experiment.

  • conformal_dict – a dictionary of dataframes containing the conformal predictions for each feature.

Returns:

the dictionary wandb_dict which contains the results of the uncertainty benchmark.

climb.tool.impl.data_suite.models.conformal module

class climb.tool.impl.data_suite.models.conformal.conformal_class(base_name='rf', norm_name='knn', conformity_score='abs', normalize=True, input_dim=2, seed=42)[source]

Bases: object

fit(x_train, y_train)[source]

> This function takes in the training data and splits it into a training set and a calibration set. It is then used to fit the conformal predictor

Parameters:
  • x_train – The training data.

  • y_train – The target variable

predict(x_test, y_test, just_conf=False)[source]

> This function takes in the test data, and returns a dataframe with the confidence intervals, the true values, and the normalized confidence intervals.

Parameters:
  • x_test – the test data

  • y_test – the true values of the test set

  • just_conf – If True, only return the confidence intervals. Defaults to False

Returns:

The prediction of the model.

climb.tool.impl.data_suite.models.copula module

climb.tool.impl.data_suite.models.copula.fit_sample_copula(clean_corpus, copula='vine', copula_n_samples=10, columns=None, random_seed=42)[source]

> The function takes a corpus of data, fits a copula to it, and then samples from the copula

Parameters:
  • clean_corpus – the corpus of data you want to fit the copula to.

  • copula – the type of copula to use. Defaults to vine

  • copula_n_samples – The number of samples to generate from the copula. Defaults to 10

  • columns – The names of the columns in the dataframe.

  • random_seed – The random seed. Defaults to 42

climb.tool.impl.data_suite.models.ensemble module

class climb.tool.impl.data_suite.models.ensemble.ensemble(epochs=10, lr=0.01, batch_size=5, n_models=5, device='cpu')[source]

Bases: object

fit(x_train, y_train)[source]

> This function fits an ensemble of n_models to the data

Parameters:
  • x_train – the training data

  • y_train – the training labels

predict(x_test, y_test, mc_samples=3)[source]

> For each model we get the predictions and the uncertainty. :param x_test: the test data :param y_test: the true labels of the test set :param mc_samples: number of Monte Carlo samples to use for prediction

Returns:

The mean of the predictions and the standard deviation of the predictions.

climb.tool.impl.data_suite.models.mcd module

climb.tool.impl.data_suite.models.mcd.all_equal2(iterator)[source]
class climb.tool.impl.data_suite.models.mcd.mc_dropout(epochs=10, lr=0.01, batch_size=5, device='cpu')[source]

Bases: object

fit(x_train, y_train)[source]

> The function instantiates a model, trains it, and then checks if the predictions are all equal. If they are, it instantiates a new model and repeats the process

Parameters:
  • x_train – the training data

  • y_train – the training labels

predict(x_test, y_test, mc_samples=3)[source]

> The function takes in the test data and test labels, and returns the predictions and the uncertainty of the predictions

Parameters:
  • x_test – the test data

  • y_test – the actual labels of the test set

  • mc_samples – number of Monte Carlo samples to use for prediction. Defaults to 3

Returns:

The predictions and the uncertainty of the predictions.

climb.tool.impl.data_suite.models.nn_conformal module

class climb.tool.impl.data_suite.models.nn_conformal.LearnerOptimized(model, optimizer_class, loss_func, device='cpu', test_ratio=0.2, random_state=0)[source]

Bases: object

Fit a neural network (conditional mean) to training data

fit(x, y, epochs, batch_size, verbose=False)[source]

Fit the model to data

Parameters:
  • x (numpy array, containing the training features (nXp))

  • y (numpy array, containing the training labels (n))

  • epochs (integer, maximal number of epochs)

  • batch_size (integer, mini-batch size for SGD)

predict(x)[source]

Estimate the label given the features

Parameters:

x (numpy array of training features (nXp))

Returns:

ret_val

Return type:

numpy array of predicted labels (n)

class climb.tool.impl.data_suite.models.nn_conformal.MSENet_RegressorAdapter(model, fit_params=None, in_shape=1, hidden_size=1, learn_func=<class 'torch.optim.adam.Adam'>, epochs=1000, batch_size=10, dropout=0.1, lr=0.01, wd=1e-06, test_ratio=0.2, random_state=0)[source]

Bases: RegressorAdapter

fit(x, y)[source]

> The function takes in a set of inputs and outputs, and uses them to train the model

Parameters:
  • x – The input data

  • y – The target values.

predict(x)[source]

> The predict function takes in a single data point and returns the prediction

Parameters:

x – the input data

Returns:

The predicted value of the input x

set_fit_request(*, x: bool | None | str = '$UNCHANGED$') MSENet_RegressorAdapter

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

x (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for x parameter in fit.

Returns:

self – The updated object.

Return type:

object

set_predict_request(*, x: bool | None | str = '$UNCHANGED$') MSENet_RegressorAdapter

Request metadata passed to the predict method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to predict if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to predict.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

x (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for x parameter in predict.

Returns:

self – The updated object.

Return type:

object

class climb.tool.impl.data_suite.models.nn_conformal.mse_model(in_shape=1, hidden_size=64, dropout=0.5)[source]

Bases: Module

Conditional mean estimator, formulated as neural net

build_model()[source]

Construct the network

forward(x)[source]

Run forward pass

init_weights()[source]

Initialize the network parameters

climb.tool.impl.data_suite.models.nn_conformal.train_loop(model, loss_func, x_train, y_train, batch_size, optimizer, cnt=0, best_cnt=inf)[source]

> The function defines a training loop for the model

Parameters:
  • model – the model we’re training

  • loss_func – the loss function we want to use

  • x_train – the training data

  • y_train – the training labels

  • batch_size – The number of samples to use for each gradient update.

  • optimizer – the optimizer to use.

  • cnt – the number of batches we’ve trained on. Defaults to 0

  • best_cnt – the number of epochs to train for.

Returns:

The epoch loss and the count

climb.tool.impl.data_suite.models.representation module

class climb.tool.impl.data_suite.models.representation.AutoEncoder(input_shape, encode_dim)[source]

Bases: object

bottleneck(x_test)[source]

The bottleneck function takes an input and returns the bottleneck compressed representation

Parameters:

x_test – The input data to be encoded.

Returns:

The bottleneck features of the input data.

fit(x_train)[source]

The function takes in the training data and trains the autoencoder for 100 epochs with a batch size of 8

Parameters:

x_train – The training data

climb.tool.impl.data_suite.models.representation.compute_representation(train, test, copula_samples, n_components=2, rep_type='pca', seed=42)[source]

> This function takes in the training and test data, the copula samples, and the number of components to use for the representation. It then standardizes the data, and uses either PCA or an autoencoder to compute the representation

Parameters:
  • train – the training data

  • test – the test data

  • copula_samples – the samples from the copula

  • n_components – the number of dimensions to reduce to. Defaults to 2

  • rep_type – the type of representation to use. Can be either “pca” or “ae”. Defaults to pca

  • seed – random seed. Defaults to 42

Returns:

the train, test and copula samples in the new representation.

climb.tool.impl.data_suite.models.representation.representation_class_based(train, copula_samples, n_components=2, rep_type='pca', seed=42)[source]

> This function computes a representation of the data. It first standardize the training data and the copula samples, then we apply PCA to the standardized data, and finally we return the PCA components of the training data, the PCA components of the copula samples, the PCA object, and the scaler object

Parameters:
  • train – the training data

  • copula_samples – the samples from the copula

  • n_components – The number of components to keep. Defaults to 2

  • rep_type – the type of representation to use. Currently only PCA is supported. Defaults to pca

  • seed – the random seed. Defaults to 42

Returns:

the transformed training data, the transformed copula samples, the PCA object, and the scaler

object.

Module contents