climb.tool.impl.data_suite.data package¶

Submodules¶

climb.tool.impl.data_suite.data.data_loader module¶

climb.tool.impl.data_suite.data.data_loader.corrupt_data_func(data, feat_list, mean=0, variance=1, proportion=0.5, dist='normal')[source]¶

> This function takes in a dataframe, a list of features to corrupt, and a distribution to corrupt the data with. It then corrupts the data with the specified distribution and returns the corrupted data, the original data, a list of the corrupted data points, a list of the noise added to the data, and a list of the indices of the corrupted data points.

Parameters:

data – the data you want to corrupt
feat_list – the list of features to corrupt
mean – the mean of the distribution you want to sample from. Defaults to 0
variance – the variance of the noise. Defaults to 1
proportion – the proportion of data that will be corrupted
dist – the distribution of the noise. Defaults to normal

Returns:

corrupt_data, data, corrupt_ids, noise, noise_id

climb.tool.impl.data_suite.data.data_loader.generate_synthetic_large(num_samples=1000)[source]¶

> This function generates a random multivariate normal distribution with the given mean and covariance matrix

Parameters:: num_samples – The number of samples to generate. Defaults to 1000
Returns:: A tuple of two numpy arrays for train and test

climb.tool.impl.data_suite.data.data_loader.generate_synthetic_small(num_samples=1000)[source]¶

> This function generates a random sample of data from a multivariate normal distribution with a specified mean and covariance matrix

Parameters:: num_samples – The number of samples to generate. Defaults to 1000
Returns:: A tuple of two numpy arrays for train and test

climb.tool.impl.data_suite.data.data_loader.load_adult_data(split_size=0.3)[source]¶

> This function loads the adult dataset, removes all the rows with missing values, and then splits the data into a training and test set

Parameters:: split_size – The proportion of the dataset to include in the test split.
Returns:: X_train, X_test, y_train, y_test, X, y

climb.tool.impl.data_suite.data.data_loader.load_electric(path='electricity.arff')[source]¶

> This function loads the electric dataset from the file, encodes the class labels, and returns the training and test sets

Parameters:: path – the path to the dataset. Defaults to elecNormNew.arff
Returns:: X_train, X_test, y_train, y_test

climb.tool.impl.data_suite.data.data_loader.load_synthetic_data(n_synthetic=1000, mean=0, noise_variance=0, dim='small', prop='0.5', dist='normal')[source]¶

> This function generates a synthetic dataset with a specified number of samples, mean, noise variance, dimensionality, proportion of noise, and distribution of noise

Parameters:

n_synthetic – number of samples to generate. Defaults to 1000
mean – mean of the noise distribution. Defaults to 0
noise_variance – the variance of the noise distribution. Defaults to 0
dim – “small” or “large”. Defaults to small
prop – proportion of data to corrupt. Defaults to 0.5
dist – the distribution of the noise. Can be “normal” or “uniform”. Defaults to normal

climb.tool.impl.data_suite.data package¶

Submodules¶

climb.tool.impl.data_suite.data.data_loader module¶

Module contents¶