climb.tool.impl.data_suite.models package¶
Submodules¶
climb.tool.impl.data_suite.models.base_model module¶
- class climb.tool.impl.data_suite.models.base_model.MyDataset(data, targets, transform=None)[source]¶
Bases:
Dataset
- class climb.tool.impl.data_suite.models.base_model.Net(dim)[source]¶
Bases:
Module- forward(x)[source]¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class climb.tool.impl.data_suite.models.base_model.benchmark_trainer(model, device)[source]¶
Bases:
object
climb.tool.impl.data_suite.models.benchmarks module¶
- climb.tool.impl.data_suite.models.benchmarks.comparison_methods(x_train, y_train, x_test, y_test, inlier_ids, df_inlier, model_type, return_ids=True, seed=42)[source]¶
> This function takes in the training and testing data, the inlier ids, the dataframe of inliers, the model type, and a seed. It then fits the specific comparison model and returns the uncertainty scores
- Parameters:
x_train – training data
y_train – the training data
x_test – the test data
y_test – the true values of the test set
inlier_ids – the indices of the inliers
df_inlier – the dataframe of inliers
model_type – the type of model to use. Can be one of the following:
return_ids – If True, returns the ids of the most uncertain and least uncertain samples. If False,
- returns the uncertainty score. Defaults to True
seed: random seed.
- Returns:
The uncertainty score
- climb.tool.impl.data_suite.models.benchmarks.conformal(x_train, y_train, x_test, y_test, inlier_ids)[source]¶
- climb.tool.impl.data_suite.models.benchmarks.sample_copula(x_train, y_train, x_test, y_test, inlier_ids)[source]¶
- climb.tool.impl.data_suite.models.benchmarks.uncertainty_benchmark(x_train, y_train, x_test, y_test, y_test_ids, ids, model_type, wandb_dict, conformal_dict=None)[source]¶
> This function takes in a model type, trains the model. It peforms the full uncertainty benchmarking, hence besides training the model, it also computes uncertainty metrics and OOD metrics. That we would log for the synthetic experiment to wandb.
- Parameters:
x_train – the training data
y_train – the training labels
x_test – the test data
y_test – the true values of the test set
y_test_ids – the true labels of the test set
ids – the indices of the inliers in the test set
model_type – the type of model to use. Can be one of the following:
wandb_dict – a dictionary that will be used to store the results of the experiment.
conformal_dict – a dictionary of dataframes containing the conformal predictions for each feature.
- Returns:
the dictionary wandb_dict which contains the results of the uncertainty benchmark.
climb.tool.impl.data_suite.models.conformal module¶
- class climb.tool.impl.data_suite.models.conformal.conformal_class(base_name='rf', norm_name='knn', conformity_score='abs', normalize=True, input_dim=2, seed=42)[source]¶
Bases:
object- fit(x_train, y_train)[source]¶
> This function takes in the training data and splits it into a training set and a calibration set. It is then used to fit the conformal predictor
- Parameters:
x_train – The training data.
y_train – The target variable
- predict(x_test, y_test, just_conf=False)[source]¶
> This function takes in the test data, and returns a dataframe with the confidence intervals, the true values, and the normalized confidence intervals.
- Parameters:
x_test – the test data
y_test – the true values of the test set
just_conf – If True, only return the confidence intervals. Defaults to False
- Returns:
The prediction of the model.
climb.tool.impl.data_suite.models.copula module¶
- climb.tool.impl.data_suite.models.copula.fit_sample_copula(clean_corpus, copula='vine', copula_n_samples=10, columns=None, random_seed=42)[source]¶
> The function takes a corpus of data, fits a copula to it, and then samples from the copula
- Parameters:
clean_corpus – the corpus of data you want to fit the copula to.
copula – the type of copula to use. Defaults to vine
copula_n_samples – The number of samples to generate from the copula. Defaults to 10
columns – The names of the columns in the dataframe.
random_seed – The random seed. Defaults to 42
climb.tool.impl.data_suite.models.ensemble module¶
- class climb.tool.impl.data_suite.models.ensemble.ensemble(epochs=10, lr=0.01, batch_size=5, n_models=5, device='cpu')[source]¶
Bases:
object- fit(x_train, y_train)[source]¶
> This function fits an ensemble of n_models to the data
- Parameters:
x_train – the training data
y_train – the training labels
- predict(x_test, y_test, mc_samples=3)[source]¶
> For each model we get the predictions and the uncertainty. :param x_test: the test data :param y_test: the true labels of the test set :param mc_samples: number of Monte Carlo samples to use for prediction
- Returns:
The mean of the predictions and the standard deviation of the predictions.
climb.tool.impl.data_suite.models.mcd module¶
- class climb.tool.impl.data_suite.models.mcd.mc_dropout(epochs=10, lr=0.01, batch_size=5, device='cpu')[source]¶
Bases:
object- fit(x_train, y_train)[source]¶
> The function instantiates a model, trains it, and then checks if the predictions are all equal. If they are, it instantiates a new model and repeats the process
- Parameters:
x_train – the training data
y_train – the training labels
- predict(x_test, y_test, mc_samples=3)[source]¶
> The function takes in the test data and test labels, and returns the predictions and the uncertainty of the predictions
- Parameters:
x_test – the test data
y_test – the actual labels of the test set
mc_samples – number of Monte Carlo samples to use for prediction. Defaults to 3
- Returns:
The predictions and the uncertainty of the predictions.
climb.tool.impl.data_suite.models.nn_conformal module¶
- class climb.tool.impl.data_suite.models.nn_conformal.LearnerOptimized(model, optimizer_class, loss_func, device='cpu', test_ratio=0.2, random_state=0)[source]¶
Bases:
objectFit a neural network (conditional mean) to training data
- class climb.tool.impl.data_suite.models.nn_conformal.MSENet_RegressorAdapter(model, fit_params=None, in_shape=1, hidden_size=1, learn_func=<class 'torch.optim.adam.Adam'>, epochs=1000, batch_size=10, dropout=0.1, lr=0.01, wd=1e-06, test_ratio=0.2, random_state=0)[source]¶
Bases:
RegressorAdapter- fit(x, y)[source]¶
> The function takes in a set of inputs and outputs, and uses them to train the model
- Parameters:
x – The input data
y – The target values.
- predict(x)[source]¶
> The predict function takes in a single data point and returns the prediction
- Parameters:
x – the input data
- Returns:
The predicted value of the input x
- set_fit_request(*, x: bool | None | str = '$UNCHANGED$') MSENet_RegressorAdapter¶
Request metadata passed to the
fitmethod.Note that this method is only relevant if
enable_metadata_routing=True(seesklearn.set_config()). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed tofitif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it tofit.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline. Otherwise it has no effect.
- set_predict_request(*, x: bool | None | str = '$UNCHANGED$') MSENet_RegressorAdapter¶
Request metadata passed to the
predictmethod.Note that this method is only relevant if
enable_metadata_routing=True(seesklearn.set_config()). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed topredictif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it topredict.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline. Otherwise it has no effect.
- class climb.tool.impl.data_suite.models.nn_conformal.mse_model(in_shape=1, hidden_size=64, dropout=0.5)[source]¶
Bases:
ModuleConditional mean estimator, formulated as neural net
- climb.tool.impl.data_suite.models.nn_conformal.train_loop(model, loss_func, x_train, y_train, batch_size, optimizer, cnt=0, best_cnt=inf)[source]¶
> The function defines a training loop for the model
- Parameters:
model – the model we’re training
loss_func – the loss function we want to use
x_train – the training data
y_train – the training labels
batch_size – The number of samples to use for each gradient update.
optimizer – the optimizer to use.
cnt – the number of batches we’ve trained on. Defaults to 0
best_cnt – the number of epochs to train for.
- Returns:
The epoch loss and the count
climb.tool.impl.data_suite.models.representation module¶
- class climb.tool.impl.data_suite.models.representation.AutoEncoder(input_shape, encode_dim)[source]¶
Bases:
object
- climb.tool.impl.data_suite.models.representation.compute_representation(train, test, copula_samples, n_components=2, rep_type='pca', seed=42)[source]¶
> This function takes in the training and test data, the copula samples, and the number of components to use for the representation. It then standardizes the data, and uses either PCA or an autoencoder to compute the representation
- Parameters:
train – the training data
test – the test data
copula_samples – the samples from the copula
n_components – the number of dimensions to reduce to. Defaults to 2
rep_type – the type of representation to use. Can be either “pca” or “ae”. Defaults to pca
seed – random seed. Defaults to 42
- Returns:
the train, test and copula samples in the new representation.
- climb.tool.impl.data_suite.models.representation.representation_class_based(train, copula_samples, n_components=2, rep_type='pca', seed=42)[source]¶
> This function computes a representation of the data. It first standardize the training data and the copula samples, then we apply PCA to the standardized data, and finally we return the PCA components of the training data, the PCA components of the copula samples, the PCA object, and the scaler object
- Parameters:
train – the training data
copula_samples – the samples from the copula
n_components – The number of components to keep. Defaults to 2
rep_type – the type of representation to use. Currently only PCA is supported. Defaults to pca
seed – the random seed. Defaults to 42
- Returns:
the transformed training data, the transformed copula samples, the PCA object, and the scaler
object.